Canadian Strategic Highway Research Program
C-SHRP Bayesian Modelling:
A User's Guide


Chapter Three


10 STEP TEMPLATE - DATA AND ANALYSIS

This chapter deals with Steps 6 and 7 of the template: developing a prior, assembling the experimental data, and completing the Bayesian regression calculations using XLBayes or BSTAT.

3.1 Develop Prior & Assemble Data - Step 6

Step 6 focuses on developing the prior and assembling the experimental data. As discussed in Section 1.4, the prior is simply any information that is known about the system being modelled before the experimental data is collected. This is the essence of the Bayesian approach and is what differentiates Bayesian regression from the classical approach. The prior may be derived either subjectively using expert judgement or objectively based on existing data or models. Both approaches require that the prior information be put into either an 'N-prior' or 'G-prior' format.

Both the N-prior and the G-prior summarize a linear regression which represents the prior state of knowledge in the Bayesian regression calculation. The prior includes the coefficients of the linear regression equation along with corresponding regression statistics such as the variances of the regression coefficients. The regression statistics indicate the certainty of the prior and are used to weigh the balance between the prior and the data in the Bayesian regression calculation. A brief overview of the information required to define an N-prior or a G-prior is provided in this section. A more thorough discussion in the context of the matrix equations for a classical and Bayesian regression analysis is contained in Appendix B.

Table 3-1: Required Prior Information

Prior Information

Required for

N-Prior?

Required for

G-Prior?

Means Vector

Yes

Yes

Variance/Covariance Matrix

Yes

No

G-Prior Data Set

No

Yes

G-Prior Factor

No

Yes

Residual Variance

Yes

Yes

Degrees of Freedom

Yes

Yes

Vector of Regression Coefficient Means

The regression coefficients are summarized in both the N-Prior and the G-Prior using the means vector. This vector consists of the mean estimate for the value of each regression coefficient. The vector of regression coefficient means for the prior is denoted bpr.


Table 3-2

Bpr =

bpr0

bpr1

bprk

There are a number of possible ways to obtain the means vector for the prior. These include direct estimation of the values by an expert, classical regression on 'old data' that may exist from previous experiments or classical regression of 'pseudo data' obtained from structured interviews.

Variance/Covariance Matrix

The N-prior uses the variance covariance matrix for the prior in the Bayesian regression calculations. The variance-covariance matrix indicates the uncertainty in the prior means. The diagonal of the matrix indicates the variance of each regression coefficient. This variance is the same as that for the 'Student's t-test' which is often used to judge the statistical significance of a regression coefficient estimate in classical regression. The off-diagonal elements of the matrix indicate the covariance between regression coefficients. Unlike the coefficient variances, the covariance's are not typically used in a classical regression analysis and most readers will be less familiar with this statistic.

The variance covariance matrix can be obtained by direct estimation by experts although this requires intimate familiarity with the inter-relationship between variables. Generally the variance covariance matrix is obtained using a classical regression on data used to derive the prior coefficient means. This data may be from previous experiments or 'pseudo-data' obtained from structured interviews. The classical regression option in both B-STAT and XLBAYES provide the variance covariance matrix as part of the output.

G-Prior Dataset

The G-Prior option is typically used when the coefficient means have been estimated directly by experts. The G-prior derives the variance covariance matrix for the coefficient means based on a set of independent variable data. The data set for the G-prior usually comes from prior experiments or pseudo data derived from structured interviews.

G-prior Factor

The G-prior factor is used to increase or decrease the influence of the prior in the calculation of the posterior. The G-prior factor is denoted g. A typical value of g is 1. This essentially gives the prior variance/covariance matrix equal weight with that for the experimental data. (In fact g weights the precision matrix, which can be derived in part from the variance covariance matrix. This is shown more fully in Appendix B.) The greater the value of g, the more influence the prior will have on the posterior.

Prior Degrees of Freedom

The number of degrees of freedom for the prior, pr, has the same meaning in the context of an N or G-prior as it does in classical regression. Where the prior is based on data of some type, Equation 3.1 may be used to calculate the number of degrees of freedom. In general, provided pr and v (i.e. the number of degrees of freedom for a classical regression on the experimental data) are sufficiently large, results will not be too sensitive to changes in its value.

Equation 3.1

Prior Residual Variance

The prior residual variance is the variance of the random error term epr. The Prior standard error is denoted as Se2pr and is analogous to the classical regression parameter Se2.

Equation 3.2

When the prior is based on an old data set or a table of 'psuedo-data', the standard error is determined by running a classical regression on this data set.

3.2 Procedure for Deriving Data-Based Priors and Model-Based Priors

It is possible to derive the information required for an N or G-prior from an old database, an existing model, or subjectively based on interviews with experienced personnel. A number of potential methods for developing priors are discussed in this section. Section 3-4 contains a step by step guide for developing a prior based on the orthogonal method, which was the method most commonly used in the joint C-SHRP/agency applications. (see Bayesian Modelling: Joint C-SHRP/Agency Applications, Technical Brief #8, Canadian Strategic Highway Research Program).

3.2.1 Data-Based Priors

The priors discussed in this section are derived from data or another model as opposed to being based on interviews with experts. An example based on using an existing database and another based on using a model are described.

The easiest type of prior to use is one based on another database. This database must include the same variables as the experimental data- however it will probably differ in other respects from the newly collected data. For example, an old set of data may have been collected with less stringent quality control and/or may have been gathered using different measuring devices.

The following steps would be used to develop a prior based on a database.

Creating a Prior using a Database

  1. Enter the prior database into an Excel spreadsheet.
  2. Use XLBayes or BSTAT to perform a classical regression on the prior database. The classical regression results from either software package will fully define an N-Prior. The results will include the coefficient means, variance covariance matrix, standard error and degrees of freedom. A step-by-step guide to using the XLBayes software is contained in Section 3.5.
  3. This information may now be used to perform a Bayesian regression in either XLBayes or BSTAT.

It should be noted that if equal weighting is given to both the prior and the data, one would expect no difference between the Bayesian regression output and the results one would obtain using classical regression on the combination of the prior and experimental databases. The benefit of a Bayesian analysis in this case is to clearly show how the posterior model is influenced by the prior information. Furthermore, Bayesian regression provides the option for the newly collected database to have more influence on the posterior results if the prior database is of lower quality or precision.

3.2.2 Model-Based Priors

It is possible to develop a prior based on an existing regression model or other type of model. A prior could be extracted from an existing design method for example. The issue in developing this type of prior is converting the existing model into either and N-Prior or a G-Prior.

If the prior model is in the same form as the desired regression model, the coefficients used in the existing model may be used as the coefficient means in a G-prior. The G-prior was originally designed to enable the user to derive only the coefficient means without having to derive the variance covariance matrix. The database specified for the G-prior may be the experimental data that is being analyzed. Estimates of the prior residual variance may be determined if the prediction error is specified for the prior model. Prior degrees of freedom must also be estimated and should reflect the number of observations the prior model is based on.

If the prior model is not in the same form as the desired regression model it may be used to create a table of pseudo data. This 'pseudo-data' consists of a set of independent data along with the accompanying dependent variable forecasts made with the prior model. The table of pseudo data is then analyzed with a classical regression similar to the Data-based prior to obtain the regression coefficient means. An estimate of the prior variance may be determined from the prediction error specified for the model. The prior degrees of freedom may be estimated based on the number of observations the prior model is based on. A G-prior may then be run by specifying the pseudo database for the G-prior data set and using a g-prior factor of one.

3.3 Methods for Deriving Subjective Priors

A subjective prior is one that is based on the experience and judgment of experts about the process being modeled. Four methods for deriving these priors are discussed in this section. All of these methods are based on an interview process with differences between the methods arising from the nature of the questions asked.

Table 3-3 : Subjective Encoding Methods

Typical Methods of Encoding Subjective Priors

1) Incremental Orthogonal
2) Full Matrix Orthogonal
3) Card Sort
4) Questionnaire

Further discussion of the methods and issues discussed in Section 3.3 can be found in the 'Bayesian Methods: Interpreting and Assembling the Prior' section of the C-SHRP Training Sessions in Bayesian Methods and Software (available on the CD-ROM version of this user's guide).

3.3.1 Incremental Orthogonal Method

The incremental orthogonal method has the expert assess the effect of changing one variable in the performance model while holding all other variables at their average value. This method of developing a prior is the simplest method to use, but it is also the most approximate because it does not consider the interaction between variables. The method can be used to estimate the mean values of the regression coefficients and a simplified variance covariance matrix.

We will illustrate the incremental orthogonal method to determine the mean values for the air voids coefficient in the regression equation for the rutting example, Equation 2-6. An appropriate range of air voids values for our model is Low (2%), Medium (3.5%) and High (5%). We would like the expert to assess the rut depth they would expect for these three settings of air voids while holding all other variables constant at their mean value. The mean value for these variables is:

Table 3-4 : Mean Values of Independent Variables

Variable Name

Mean Value

Percent Retained on #4 Sieve

42%

Age of Overlay

10 Years

Overlay Thickness

85 mm

Percent Crushed Particles

65%

Traffic

110 KESAL's per year

An expert gives the following estimates of rut depth considering the mean values of the other variables and the 3 settings of the air voids variable.

Table 3-5 : Incremental Assessments

% Air Voids

Rut Depth (mm)

2

12

3.5

10

5

9

We may now obtain two estimates of the rut depth coefficient in our regression model. The form of the regression model is:

Rut Depth = bo+ b1(AirVoids)+b2(Retained)+b3(Age)+b4(Thick)+b5(Crush)+b6(Traf)

To determine the 'Low' estimate of b1 we can create two equations based on the rut depth estimates for air void values of 3.5 and 5%.

10 = constant + b1(3.5)

9 = constant + b1(5)

Subtracting the two equations we get an estimate for b1.

(10 - 9) = b1 (3.5 - 5)

b1 = -0.67

Similarly we can obtain a 'High' estimate of b1.

(12 - 10) = b1 (2 - 3.5)

b1 = -1.33

We can approximate the mean value of b1 by taking the average of the two results (b1 = 1). An approximation of the variance of b1 can also be made with the two results.

Var(b1) = ((L-H)/2)2

Var(b1) = (((-0.67)-(-1.33))/2)2

Var(b1) = 0.1089

Note that this is only a simplified approximation of the variance. In using the above procedure, the high and low estimate of the coefficients may actually be the same (i.e. a consistent linear relationship). A better approach would involve calculating a range of values for the coefficient and eliciting a confidence interval for the variable from the expert.

Limitations aside, we have determined one element on the diagonal of the prior variance/covariance matrix. By assumption the covariance between the coefficients is equal to zero. To complete the N-prior we also need to make direct estimates of Se2 and the degrees of freedom.

Var/Covarprior =

           
 

b02

0

0

0

 
 

0

0.1089

0

0

 
 

0

0

b22

0

 
 

0

0

0

bk2

 
           

Another possibility would be to use the preceding method to estimate the means of the regression coefficients directly and then use the G-prior option to obtain the variance/covariance matrix.

The advantage of the incremental orthogonal approach is that it requires the least number of responses from the experts. It gives a quick estimate of the means of the regression coefficients and provides a simple starting point for the variances. The disadvantage is that its simplified form doesn't attempt to capture the covariance that likely exists between the estimates of the coefficients.

3.3.2 The Full Matrix Orthogonal Method

In the full matrix orthogonal method every variable is changed systematically relative to every other variable. All possible combinations of variables are included by systematic enumeration. The expert is required to provide a large number of estimates of the dependent variable, one for each variable combination.

An example of the full orthogonal encoding matrix used in the C-LTPP Rutting model is contained in Figures 3-1 and 3-2. Note that Figure 3-1 represents the low traffic setting and the Figure 3-2 the high traffic setting. After the matrix is filled out by the expert, it is converted into a table of data. A classical regression is then run on this pseudo-data to create the prior parameters required in an N-prior. The detailed step-by-step procedure for developing a prior using the orthogonal method is contained in section 3.4.

The advantage of the orthogonal approach is that it encompasses the full range of variable settings and combinations. This 'brute force' approach permits calculation of the full variance-covariance matrix in addition to the coefficient means.

One disadvantage of the method is that presentation of the questions in the form of a matrix tends to introduce a systematic anchoring bias into the experts' responses. Many experts will start with an estimate for one cell and then systematically adjust this estimate to fill out every other cell in the matrix. The systematic adjustment is often based on a simple additive linear approach which sums up the individual effects of change to the variables.

Figure 3-1: Encoding Matrix for the Rutting Example (p. 1 of 2)

Figure 3-2 : Encoding Matrix for the Rutting Example (p. 2 of 2)

This bias can result in a very definitive (i.e. confident) prior with small variances in the parameter estimates. However, the expert may be much less confident about the parameters than his response to the orthogonal matrix would suggest. Section 3.4 contains more information on bias in the full matrix orthogonal approach as well as certain steps that can be taken to minimize it.

Another disadvantage of the orthogonal method is that some of the questions tend to be hypothetical. These "synthetic" questions result from the combinations of the orthogonal matrix and may not reflect the real world situations the expert is familiar with. Certain questions in the orthogonal matrix may not be realistic. As a result of this, the experts may in fact have little experience upon which to base their response for certain cells.

3.3.3 The Card Sort Method

The card sort method is based on a series of cards each describing a different combination of the independent variables. The expert is asked to sort the deck in order of increasing performance. In the rutting example, the expert would sort the deck in order of improved rutting performance.

The cards are designed to span an inference space similar to the orthogonal method. However, in contrast to the orthogonal approach, they are designed such that no card contains a single variable change from another card. At least two variables must vary between any two cards. This is purposely done to force the expert to consider the combined effect of changing two variables.

After the cards have been sorted, they are analyzed and converted into a statistical form that can be used as an N-prior in XLBayes. The method used is known as conjoint analysis. Software packages are available to facilitate a conjoint analysis.

The card sort method avoids some of the bias of the orthogonal method, however it may be difficult to design the deck. Several iterations of the deck may have to submitted to the experts before an appropriate set of cards that can be effectively ordered is found. Application of the card sort method has been limited in C-LTPP to a demonstration in the training sessions.

3.3.4 The Questionnaire Method

The questionnaire method differs from the full matrix orthogonal approach in that the questions are not structured to be a systematic combination of all possible combinations of variables. Instead each question represents a combination of independent variables, each of which represents a real world situation. The expert is asked to provide an estimate of the dependent variable for each combination of independent variables. The result of the interview is a table of pseudo data which may be analyzed using classical regression to create an N-prior.

One advantage of the questionnaire method is that it tends to avoid the anchoring and incremental adjustment bias that can occur with the full matrix orthogonal method. Questions are asked one at a time and do not correspond to a simple combinatorial pattern. Furthermore, questions are not hypothetical and represent real world situations.

The questionnaire method is not discussed in any further depth in this user's guide, nor was it used in the C-LTPP related Bayesian analysis projects. However, the method is conceptually very simple requiring an analysis of pseudo-data identical to the full matrix orthogonal method.

3.4 Procedure for the Full Matrix Orthogonal Method

The incremental and full matrix orthogonal methods were described in Section 3.3. A detailed example of the full matrix orthogonal method as used in the 1989-1994 C-SHRP modelling project is presented here.

C-SHRP used four steps to develop a prior using the full matrix orthogonal method. The first step is to prepare an encoding package which describes the problem, identifies particular issues that are to be addressed by the model, and defines the selected dependent variable as well as each of the contributory variables. The 'high', 'medium' and 'low' setting of the contributory variables are also identified. Next, the encoding package is given to several experts for review. Any questions the experts raise are addressed before interviewing begins.

In the interview, the experts are required to fill in an encoding matrix, similar to that presented in Section 3.3.2, with their estimates of the dependent variable values associated with different causal variable combinations. When the interviews are completed the information from each expert is analyzed. The final step in developing a prior using the orthogonal method is to compare the results from each expert in order to identify consensus judgment or any inconsistencies. The result of the four step process is a single set of expert judgment in the form of an N-prior. The steps outlined above are discussed in more detail in the following sections.

3.4.1 Developing the Encoding Package

The primary purpose of the encoding package is to fully describe the problem in order to ensure the experts don't misunderstand what they must assess. The encoding package should contain a complete description of the selected dependent and independent variables, a description of implicit contributory variables that do not appear in the model and a list of all assumptions made.

An encoding package is usually made up of the following elements:

Table 3-6 : Encoding Package Outline

Encoding Package Elements

1. Introduction
2. Description of inference space
3. Definition of dependent variable, Y
4. Definitions of contributory variables
5. Instructions for completing forms
6. Encoding forms

The introduction to the encoding package is used to give the experts some background to the model building project. This includes describing the type of model, the proposed uses for the model and the role their judgment will have in the development process.

The problem of inadvertently introducing bias into the encoding process should be considered before proceeding with the development of the encoding package. Bias can be introduced in the way questions are posed and the way information is presented. Bias could be introduced, for example, by stating that rutting increases as the % of air voids increases. An expert might bias their judgment specifically to make the statement true.

In fact, in the case of % air voids, the opposite sign for air voids could also be assumed. High air voids might also mean insufficient compaction of the AC layer and thus more rutting. In this case, bias in the encoding packing could potentially spoil the prior. The encoding package should therefore be selective in terms of the information it gives the experts.

The description of the inference space should identify assumptions and implicit variables included in the model. The encoding package should contain a full description of the selected dependent variable including units and method of measurement. The description of the inference space and the dependent variable should be based on the results from Step 2 of the template (Section 2.3).

The definition of each contributory variable should be based on the summary written in Step 4 of the template (Section 2.5). For each variable the units, sources of experimental data, methods of measurement, and references to further information should be described.

The next part of the encoding package is used to provide the expert with instructions on how to complete the accompanying encoding forms. The instructions should contain an explanation of some potential pitfalls associated with the orthogonal method. The most serious of these pitfalls is the problem of anchoring bias when filling out the matrix. Anchoring bias occurs when an expert enters an initial value into the matrix and then fills in adjacent cells by mentally using a simple linear function. For example, consider the case where an expert is asked to enter an estimate of rutting depth after three, six and nine years:

  Age = 3 Years Age = 6 Years Age = 9 Years
Rut Depth      

Anchoring bias arises when the expert enters a value for six years and then simply enters values for three and nine years according to a simple linear model like the one depicted in Figure 3-3- without considering other, non-linear increments.

Figure 3-3 : Simple Linear Anchoring Bias

Some experts will continue using linear adjustments to fill out the entire encoding matrix. This problem can be alleviated by informing the expert of the potential for problems and encouraging the expert to use his/her own judgment to fill out the matrix instead of an assumed linear function. Another approach is to break up the orthogonal matrix into several pieces and to present each piece to an expert at different times.

3.4.2 Choose Experts

The next stage in developing a prior using the orthogonal matrix method is choosing the experts. It is best to use experts with considerable experience including direct experience with both the dependent and independent variables. For example, sources of experts related to pavement performance models include transportation agency design, maintenance and construction staff, engineering consultants and the construction industry.

One thing to keep in mind when using more than one expert is the issue of combining expert judgment or otherwise arriving at some kind of forced consensus of the experts. Combining the judgment of several experts by lumping their pseudo-data together or by averaging or weighting their results, is not recommended.

An expert's judgment is inherently personal and stands on its own. Combining the judgment of several experts is an attempt to force a consensus where one does not exist. Consider an example where two experts disagree about the sign of a coefficient. Averaging their judgment to achieve a consensus means the actual belief of one or the other (or both) is going to be ignored or severely compromised. Step 8 of the template (Section 4.1) addresses the problem of forming a consensus among the experts.

3.4.3 Conduct Interviews

Once the encoding package has been prepared and the experts selected, the next step is to conduct the interviews. The interviews are usually done independently by fax, mail or email. Generally the interviewer talks with an expert after they have received and reviewed the encoding package in order to ensure the expert is comfortable with the information they have been provided and that they fully understand what they are being asked to do. During this stage of the encoding process the interviewer should be available to answer questions from the experts. Each expert fills out the encoding matrix on their own and then returns it to the interviewer.

3.4.4 Validate Results

The reason for validating the responses in the encoding matrices is to confirm that the experts have understood the dependent and independent variables before completing the rest of the analysis. One simple validation technique is to have each expert describe their interpretation of the dependent and contributory variables, and also state any additional assumptions that they may have made. The interviewer can then verify that the problem has been properly interpreted.

3.4.5 Analyze Pseudo Data

The data from the completed orthogonal matrices is treated as pseudo data and analyzed using classical regression in either BSTAT or XLBayes. Information on running XLBayes is provided in Section 3.5 of this guide (use of BSTAT is very similar). A table of pseudo data for Expert #3's rutting model prior is presented in Table 3-6. This table was obtained by re-arranging the data contained in the encoding matrices presented in Section 3.3.2.

Table 3-7 : Pseudo Data for Expert #3

A classical regression on the pseudo data for Expert #3 produced the regression results shown in Table 3-7 and the variance/covariance matrix in Table 3-8. A step-by-step procedure for obtaining these results is provided in Section 3.5.1. This information can be used directly as an N-prior or as a portion of a G-prior.

Table 3-8 : Regression Results for Expert #3

Table 3-9 : Variance/Covariance Matrix for Expert #3

In some cases, analysis of the pseudo data may result in a residual variance which is unrealistically low. Where the variance does not accurately reflect the expert's opinion, a G-prior and the expert's direct estimate of the variance could be used instead of an N-Prior.

The number of degrees of freedom for the prior may also be adjusted downwards if it is unrealistically large. Typically, posterior results are not too sensitive to this provided the number of degrees of freedom for both the prior and the experimental data are reasonably large. However, it can be significant where the number of experimental data points is very small and the number of cells in the orthogonal matrix is very large.

3.4.6 Assemble Experimental Data

The final step in assembling the information is to prepare the experimental data for analysis. Data for the dependent as well as associated contributory variables are required. The data source for each variable was identified during Steps 2 and 3 of the template. The suitability and completeness of the data was assessed as well.

Additional examples of encoding packages are found in the appendices of the following two joint C-SHRP/agency applications (available on the CD-ROM version of the user's guide):

Saskatchewan - Subgrade Shear Failures - Appendix C (Widger 1995).

Alberta - Predicting Roughness Progression on AC Overlays - Appendix D (Kurlanda 1995).

3.5 Perform Bayesian Regression - Step 7

Once the prior information and the experimental data have been gathered and put in the proper format, the next step is to combine them by performing a Bayesian regression. This may be done with either XLBayes or BSTAT. Operation of the two programs is similar.

To demonstrate the mechanics of using the software, a set of examples are provided in this section based on the rutting model example and XLBayes. In Section 3.5.1, classical regression is used to develop an N-Prior based on Expert #3's pseudo data. An example of Bayesian regression using the N-prior option is described in Section 3.5.2. The G-prior option in XLBayes is described in Section 3.5.3. The example spreadsheets RUTXMPL.XLS, RUTCRSLT.XLS, RUTNRSLT.XLS and RUTGRSLT.XLS which accompany the examples are provided on the CD-ROM version of the user's guide.

3.5.1 Performing Classical Regression Using XLBayes

As discussed earlier in this Chapter, XLBayes may used to perform a classical regression. The first step in performing a classical regression using XLBayes is to input the number of terms used in the model and to set the classical regression option. This is accomplished by using the Setup option from the XLBayes menu (Figure 3-4).

Figure 3-4 : XLBayes Menu

This action brings up the XLBayes Setup dialog box (Figure 3-5) which allows the user to specify the type of prior used and the number of variables. For the rutting model there are 6 independent variables and a constant.

Figure 3-5 : Setup Dialog Box Indicating Classical Regression

Once the setup is complete it is necessary to tell XLBayes where the experimental data is located. This is accomplished by selecting the "Sample Data" menu item from the XLBayes menu (Figure 3-4). This will bring up the Sample Data dialog box (Figure 3-6) which prompts the user to select data ranges for both the experimental dependent and independent variables. The cell references in the figure correspond to the file RUTXMPL.XLS, which is included on the companion CD-ROM. The ranges specify the table of pseudo data presented earlier for Expert #3. Selecting the 'Labels' tick box indicates that the names of the variables are contained at the top of the cell ranges given. The user can optionally specify a range of independent variable data for making sample predictions. Use of the predictions option is shown in Section 3.5.3.

Figure 3-6 : Sample Data Dialog Box Specifying Pseudo Data

Once the above information has been specified the analysis can be initiated by selecting the Analyze option at the bottom of the XLBayes menu (Figure 3-4). XLBayes copies the specified data to a new stand alone workbook and performs the classical regression calculations described in Appendix B.

Figure 3-7 : Data Tab in Results Workbook

The output for the example is provided in the spreadsheet RUTCRSLT.XLS, which is contained on the companion CD-ROM. Click on the data tab at the bottom of the results screen (Figure 3-7). You may need to use the tab arrows at the bottom left of the screen to make the data tab visible. Results will be identical to those presented in Tables 3-7 and 3-8.

3.5.2 Performing an N-Prior Bayesian Regression Using XLBayes

The first step in performing an N-prior Bayesian regression using XLBayes is to input the number of terms used in the model and to set the N-prior option. This is accomplished by using the Setup option from the XLBayes menu (Figure 3-4).

This action brings up the XLBayes setup dialog box (Figure 3-8) which allows the user to specify the type of prior used and the number of variables. For the rutting example an N-prior will be used with 6 independent variables and a constant.

Figure 3-8 : Setup Dialog Box Indicating N-Prior

Once the setup is complete it is necessary to tell XLBayes where the experimental data is located. This is accomplished by selecting the "Sample Data" menu item from the XLBayes menu (Figure 3-4). This will bring up the Sample Data dialog box (Figure 3-9) which prompts the user to select data ranges for both the experimental dependent and independent variable. The cell references in the figure correspond to the file RUTXMPL.XLS, which is included on the companion CD-ROM. The user can optionally specify a range of independent variable data for making sample predictions.

Figure 3-9 : Sample Data Dialog Box

The next step is to select "Prior Data" from the XLBayes menu (Figure 3-4). This will bring up a dialog box (Figure 3-10) that prompts the user for four sets of information: prior variance/covariance matrix, prior regression coefficient means, prior degrees of freedom, and prior residual variance.

Figure 3-10 : N-Prior Data Input Dialog Box

Note that when specifying the vector of regression coefficient means for the prior, the constant should be placed at the top of column rather than at the end. The order is different elsewhere in this guide (i.e. Table 3-7) because earlier version of XLBayes place the constant at the bottom of the column.

Once the above information has been specified the analysis can be initiated by selecting the "Analyze" option at the bottom of the XLBayes menu (Figure 3-4). XLBayes copies the specified data to a new stand alone workbook and performs the posterior calculations described in Chapter 3. The output for the example is provided in the spreadsheet RUTNRSLT.XLS, which is contained on the companion CD-ROM.

All calculations on the results spreadsheet are dynamically linked which means that subsequent changes made to the prior, for instance, are immediately reflected in the posterior results. The results consist of the posterior regression coefficients, prior variance-covariance matrix, posterior degrees of freedom and the posterior standard error. As well, for comparison purposes, the experimental data, prior and posterior normal distributions for each regression coefficient are plotted. Precision and correlation matrix results are also provided. Select the appropriate tab in the results workbook to view these results. An example of this output and interpretation of modelling results are provided in Section 4.2.

3.5.3 Performing a G-Prior Bayesian Regression Using XLBayes

Performing a G-prior Bayesian regression begins by selecting the "Setup" menu item from the Bayes menu (Figure 3-4). This action brings up a dialog box (Figure 3-11) that prompts the user to specify the type of prior they are using, which is the G-prior in this case.

Figure 3-11 : Setup Dialog Box Indicating G-Prior

Selecting the "Prior Information" menu item from the Bayes menu (Figure 3-4) brings up the GPrior Data Input dialog box (Figure 3-12). The G-prior data range, means vector, residual variance and degrees of freedom must be entered as well as a value for G. The cell references again correspond to the Excel spreadsheet RUTXMPL.XLS included on the companion CD-ROM. Include only numerical values in the ranges, with no text labels.

Figure 3-12 : G-Prior Data Input Dialog Box

The next step is to select the "Sample Data" option from the Bayes menu (Figure 3-4). This will bring up a dialog box (Figure 3-13) that prompts the user to select data ranges for the experimental data dependent and independent variable ranges. The user can optionally specify a range of independent variable data for making sample predictions.

Figure 3-13 : Sample Data Reference Dialog Box

To complete the analysis, choose "Analyze" from the Bayes menu (Figure 3-4). The program will calculate the results and put them in a new workbook. To view results, select the appropriate tab from the results workbook. The prediction results will be contained on a separate tab sheet entitled 'Predictions' which can be found before the 'Prior', 'Data' and 'Posterior' tabs. The results for this example are contained in the spreadsheet RUTGRSLT.XLS which can be found on the companion CD-ROM. An example of interpretation of modelling results is contained in Section 4.2.

References

Canadian Strategic Highway Research Program, Technical Briefing #8: Bayesian Modelling- Joint C-SHRP/Agency Applications, Transportation Association of Canada, 1995.

Kurlanda, Marian H., Kajner, L., Predicting Roughness Progression of Asphalt Overlays, Joint C-SHRP/Alberta Bayesian Application, Canadian Strategic Highway Research Program, Transportation Association of Canada/Alberta Transportation and Utilities, Ottawa, 1995.

Vemax Management Inc./Decision Focus Incorporated, Training Sessions in Bayesian Methods and Software, Canadian Strategic Highways Research Program, Transportation Association of Canada, Ottawa, 1995.

Vemax Management Inc., C-LTPP Bayesian Analysis Project - Consolidated Working File, Canadian Strategic Highway Research Program, Transportation Association of Canada, Ottawa, 1994.

Widger, A., Schmidt, R., Subgrade Shear Failures: Joint C-SHRP Saskatchewan Bayesian Application, Canadian Strategic Highway Research Program, Transportation Association of Canada, Ottawa, 1995.

Proceed to Chapter Four

Return to Table of Contents