
Canadian Strategic Highway
Research Program
C-SHRP Bayesian
Modelling:
A User's Guide
Appendix
B BAYESIAN REGRESSION THEORY
Introduction This chapter provides the equations for classical and Bayesian regression in matrix form. While derivation and proof of these equations is beyond the scope of this user's guide, the equations provided define what is needed to set up a Bayesian regression in either B-STAT or XLBayes and interpret the output. The suggestions for further reading given at the end of this chapter are a starting point for finding out more about the theory of Bayesian regression. A primer in general statistical
concepts is provided in the notes from the Training
Sessions in Bayesian Statistical Methods and Software
(available on the CD-ROM version of this user's guide). Classical Regression Classical regression is a method that most analysts are familiar with. For this reason it provides a good frame of reference to work from in describing the terminology and techniques used in Bayesian regression. This section presents the equations for classical regression in matrix form. Linear regression assumes an additive-linear relationship between the dependent and independent variables. The standard additive-linear regression equation is shown in Equation 1. Equation 1
A set of experimental data is used to solve for the unknown regression coefficients. This data is defined in Table 1. Table 1 - Experimental Data
The experimental data is used to define matrix X and vector Y. X is a matrix of the independent data. A column of ones (i.e. a constant) has been inserted in this case because we will be calculating a constant for the regression equation, b0 as shown in Equation 1. If no constant is being calculated, the column of ones is omitted.
Y is the vector of the dependent data.
Coefficient Means Once the experimental data has been prepared, the ordinary least squares (OLS) regression procedure is used to solve for the mean of the regression coefficients. This is carried out by evaluating Equation 2 (In the matrix notation used, Mt indicates the transform of matrix M. The inverse of matrix M is denoted M-1.): Equation 2 Evaluating equation 2 yields the vector of regression coefficient means, b.
Degrees of Freedom The number of degrees of freedom, v, for the regression model is calculated using Equation 3. Equation 3 Residual Variance The random error term, e, has a mean of zero and is normally distributed under the assumptions of OLS regression. The variance of the random error term is calculated by determining the variance of the residual. The residual is the difference between the actual observations of the dependent variable from the experimental data and the predictions made using the model. The variance of the residual is calculated using Equation 4.
Equation 4 may be re-written in matrix notation. Equation 5 Variance/Covariance Matrix In the variance of the regression coefficients is typically used in classical regression to evaluate their statistical significance (i.e. student's t-test). The variance of each coefficient can be determined by calculating the variance-covariance matrix using Equation 6. The variance-covariance matrix is also an important tool in defining priors for Bayesian regression. Equation 6 The variance/covariance matrix is diagonally symmetric with diagonal terms representing the variances of each regression coefficient and the off-diagonal terms representing the covariance between pairs of regression coefficients.
Bayesian Regression In keeping with the philosophy of Bayesian statistics, Bayesian regression recasts classical regression into a more general form that includes both prior information and experimental data. The equations used in Bayesian regression closely parallel those for classical regression and the resulting linear regression equation is in the same form as the classical result. In fact, results identical to the classical regression result can be obtained by making the prior information sufficiently diffuse or vague. This section provides a summary of the Bayesian regression equations and procedures used within the BSTAT and XLBayes software. The reader should compare the classical form of these equations to the Bayesian form to get a clear understanding of the similarities and differences between the two methods. This section has been organized into parts which reflect the major tasks in performing Bayesian regression.
Bayesian Regression Task 1 : Specifying Prior Information The first step in performing Bayesian regression is specifying the required priorinformation. This section details two types of priors, the N-prior and the G-prior, both of which are supported by XLBayes and B-STAT. Both the N and G-prior have a regression equation form. The prior equation always has the same form as the equation used for a classical regression on the experimental data. Equation 7
Prior Regression Coefficients The prior regression coefficients are estimates of the mean value of each regression coefficient. The vector of regression coefficient means for the prior is bpr.
There are a number of possible ways to obtain the means vector for the prior. These include direct estimation of the values by an expert, OLS regression (i.e. equation 2) on 'old data' that may exist from previous experiments or OLS regression of 'pseudo data' obtained from structured interviews. Prior Degrees of Freedom The number of degrees of freedom for the prior, pr, has the same meaning in the context of an N or G-prior as it does in OLS regression. Where the prior is based on data of some type, Equation 8 may be used to calculate the number of degrees of freedom. Where the prior is not based on data, the degrees of freedom for the prior is sometimes estimated to be of the same order of magnitude as for a classical regression on the experimental data. In general, provided pr and v (i.e. the number of degrees of freedom for a classical regression on the experimental data) are sufficiently large, results will not be too sensitive to changes in its value. Equation 8 Prior Residual Variance The prior residual variance is the variance of the random error term epr. The Prior standard error is denoted as Se2pr and is analogous to the classical regression parameter Se2. Equation 9 Prior Precision Matrix The N-prior and G-prior differ only in the way the prior precision matrix is determined. The N-prior requires a variance-covariance matrix to determine the prior precision matrix whereas the G-prior uses a set of independent data to calculate the prior precision matrix. Calculating the Prior Precision Matrix - N-prior The N-prior uses the variance covariance matrix for the prior to calculate the precision matrix. The variance-covariance matrix has the same meaning and interpretation for the prior regression coefficients as does the variance-covariance matrix for the regression coefficients in a classical regression.
Bayesian regression results are quite sensitive to estimates of the prior variance, as discussed in section 4.2.2. Obtaining a reasonable estimate of the variance/covariance matrix from data collected in expert interviews is more difficult in general than obtaining an estimate of the regression coefficient means from this data. In some cases, derivation of the variance/covariance matrix from pseudo-data will yield low estimates of variances. This issue is discussed further in Sections 3.3.2 and 3.4.1. Using the G-prior is a method of addressing potential problems with determining the covariance matrix for the prior. The precision matrix for the prior, A, is calculated for the N-prior using equation 10. Equation 10 Calculating the Prior Precision Matrix - G-prior The G-prior uses a set of independent variable observations to calculate the prior precision matrix. In addition to the set of independent variable data, the G-prior requires estimates of the prior regression coefficient means, prior degrees of freedom, prior residual variance, and a constant known as the G-prior factor. G-prior Independent Variable Data The G-prior independent variable data is a set of independent variable observations similar to the data used to perform classical regression. The difference is that no associated dependent variable observations are required.
There are some options for the source of the G-prior independent variable set. Two obvious choices are the data set used to develop the prior means and the experimental data that is being analyzed. If the experimental data is used (and g, as discussed in the following paragraph, is equal to 1), the posterior regression coefficient means will always be the average of the prior regression coefficient means and the regression coefficient means determined from a classical regression on the experimental data. G-prior Factor The G-prior factor is a positive real number that is used as a weight in the calculation of the prior precision matrix. The G-prior factor is used to increase or decrease the influence of the prior in the calculation of the posterior. The G-prior factor is denoted g. A typical value of g is 1. This gives the prior precision matrix equal weight with the precision matrix calculated from the experimental data. The greater the value of g, the more influence the prior will have on the posterior. The prior precision matrix for the G-prior is calculated using Equation 11. Equation 11 Bayesian Regression Task 2 : Analyze Experimental Data The second step, analyzing the experimental data, is the same as for classical regression except for one additional calculation, the precision matrix for the experimental data. The definitions of b, v, X, Y, and all other terms are the same as defined for classical regression. Experimental Precision Matrix The precision matrix for the experimental data, H, is a term isolated from Equation 5. Equation 12 Using this terminology, Equation 5 can be re-written as Equation 13. Equation 13 Bayesian Regression Task 3: Calculating the Posterior The final step in performing Bayesian regression is to calculate the posterior results by combining the prior with the experimental data. Posterior Precision Matrix The posterior precision matrix , M, is calculated by adding the prior precision matrix to the experimental data precision matrix. Equation 14 Posterior Regression Coefficients Next, the posterior regression coefficients are calculated using a weighted average of the prior regression coefficients and posterior regression coefficients. The associated precision matrices in each case are used for the weights. The posterior precision matrix is used to normalize the results: Equation 15 Posterior Degrees of Freedom The posterior degrees of freedom are calculated by adding the prior degrees of freedom and experimental degrees of freedom and adding the number of coefficients in the functional form. Equation 16 adds one to k, assuming that a constant b0 is present in the regression equation. Equation 16 Posterior Standard Error The posterior standard error is calculated by adding the prior standard error to the experimental data standard error and adding two additional factors to account for the deviation of the posterior regression coefficients from the experimental coefficient and the deviation of the posterior regression coefficients from the prior regression coefficients. Posterior Variance/Covariance Matrix The posterior variance/covariance matrix is calculated using Equation 18. Equation 18 Summary This Chapter has reviewed some basic statistical concepts and the theory of both classical and Bayesian regression. All of the calculations necessary to perform a Bayesian regression have been introduced. The remaining Chapters in this user's guide are application oriented, dealing with the practicalities of conducting a Bayesian regression analysis. For those seeking a more detailed discussion of the theory of Bayesian regression, a large number of references are available. Raiffa & Schlaifer and Zellner, as listed in the suggested further reading for this Chapter, both provide an excellent discussion. References Abowd, J.M., Moulton, B.R. and Zellner, A, User's Guide to PC-BRAP, H.G.B Alexander Research Foundation, Graduate School of Business, University of Chicago, 1984. Suggestions for Further Reading Press, S., Bayesian Statistics: Principles, Models and Applications, John Wiley and Sons, New York, 1989. Raiffa, H. and Schlaifer, R., Applied Statistical Decision Theory, Division of Research, Graduate School of Business Administration, Harvard University, 1961. Zellner, A, An Introduction to Bayesian Inference in Econometrics, Robert E. Krieger Publishing Co., Malabar, Florida, 1987. |
||||||||||||||||||||||||