Return to Main Page

Design of a Long Term Pavement Monitoring System for the Canadian Strategic Highway Research Program

 

 

2.3 THE BAYESIAN STATISTICAL APPROACH TO PAVEMENT MONITORING

2.3.1 Mathematical Development

This section extends the regression discussion of the linear model presented above, applying a Bayesian approach. The Bayesian approach begins in quite the same way as the regression approach, with a postulated model of pavement deterioration and a series of observations. We will retain the same linear model as we used in the previous section, y = bx + e. We will retain the same n observations on y and x discussed in the previous section, which define n equations

Equations 24-26

In implementing the Bayesian approach, we must make an important assumption from regression analysis very explicit. In particular, we must make an explicit probabilistic assumption regarding the nature of the error terms in equations (24)­(26) In particular, we postulate a specific mathematical equation, a probability distribution, that quantifies the error terms e1,e2,...,en. We could postulate any probability distribution we wish, but we will postulate that the error terms are governed by independent, identically distributed, normal distributions with mean zero and variance s2:

Equation 27

This assumption (commonly referred to as homoskedastic, non­autocorrelated errors) is not as benign as it might appear at first blush. We assume that the error terms are all governed by the exact same normal probability distribution with the exact same zero mean and the exact same variance s2.

Assuming the n error terms are independently sampled from the same normal distribution, the likelihood function given n observations is the probability that the specific errors e1,e2,...,en will occur. The probability that the specific errors e1,e2,...,en will occur is the probability that precisely these n samples will be selected from the normal distribution. The answer is:

Equation 28

where

Equation 29

in which R2 is given by equation (20) and c1 is a normalizing constant that ensures that equation (28) integrates to unity. Equation (29) is simply the unbiased estimate of the variance of the error terms s2 in equation (6) above from regression methods.

We assume that the prior probability distribution over the parameters b of the model and the variance s2 of the random terms in the model is given by the following rather general functional form

Equation 30

where c0 is a normalizing constant such that the integral is unity. The terms k, M, m, and b0 are parameters we shall use to quantify our prior state of information. The parameter b0 represents the mean of our prior estimate of the model parameters b. (The number in the missile example analogous to b0 is the initially estimated failure probability of 1/3.) We will discuss how these parameters will be set below. The prior probability distribution is a joint probability distribution over model parameters b and variance s2 and therefore implicitly represents our total state of information before any observations are made, i.e., before any data is collected by the C­LTPP program. In particular, the prior probability distribution represents not only what we believe to be true about the pavement deterioration parameters b but also what we believe to be true about the random disturbance terms ei.

The best summary measure of what we know about pavement deterioration before any long term monitoring is performed is contained in the marginal distribution over the model parameters b consistent with the joint prior distribution in (30). Integrating out the error parameter s in (30), we obtain the following marginal distribution over the parameter b

Equation 31

where B(m,n) is the "beta function"

Equation 32

The variance of the variable b in equation (31) can be shown to be

Equation 33

as long as the variable k is strictly larger than 4.

If we examine the mathematical expression for the probability density function called the t­distribution, we can show from equation (31) that the variable

Equation 34

is distributed according to a t­distribution with degrees of freedom k­2. Equation (34) reemphasizes that the mean estimate of the model parameters b before any data is gathered is b0, i.e., the mean of the prior probability distribution over the model parameters is b0, Equation (33) indicates that the variance of the prior distribution grows as M grows and decreases as m and k grow.

Figure 2-4

Figure 2­4 plots the marginal distribution over the model parameters b for various values of k. As we shall show later, the parameter k can be interpreted heuristically as the "size of the preexisting data base" that underlies the prior probability distribution. Large values of k imply that the prior probability distribution is based on large quantities of data (and thereby less uncertainty), while small values of k imply less prior data (and thereby more uncertainty). Notice in the figure, larger values of k imply tighter prior distributions about the mean b0, i.e., better prior knowledge.

We now combine the prior probability distribution in equation (30) with the likelihood function in equation (28) as dictated by Bayes Theorem. Such application of Bayes Theorem implies the relationship that the posterior joint probability distribution over the model parameters b and the uncertainty s2 in the random terms has the same mathematical form as the prior in (30). The posterior joint distribution over the model parameters b and the error term s2 is

Equation 35

where b is as defined in equation (5). The term c1 is a normalizing constant that renders the integral of the joint probability distribution in (35) equal to unity. Notice the direct analogy in form between the prior distribution in equation (30) and the posterior distribution in equation (35). This property­commonality of form between the prior and posterior­is called a conjugate relationship. Conjugate prior and posterior distributions such as (30) and (35) are extremely convenient because they forestall much analytical complexity.

It is not particularly useful for our purposes to analyze the joint probability distribution over the model parameters b and the variance s2 in the error terms. Rather, the critical result we seek is the unconditional (marginal) posterior distribution over the model parameters b. This unconditional (marginal) posterior distribution over the model parameters b contains the richest and fullest possible representation of both the prior information and the new data gathered during the pavement monitoring process.

The unconditional (marginal) posterior distribution over the model parameters b is given by integrating equation (35) over all possible values of s2. Performing the necessary integration, we can show that the marginal posterior distribution over the model parameters b is

Equation 36

where c2 is a normalizing constant ensuring that the expression integrates to unity. If we define the following terms in equation (36),

Equations 37-39

the probability distribution in equation (36) can be written in the form

Equation 40

where B(.,.) again represents the beta function. If we examine the mathematical expression for the t­distribution, we can show from equation (40) that the variable

Equation 41

is distributed according to a t­distribution with degrees of freedom n+k­2. The t­distribution with degrees of freedom n+k­2 has a mean of zero and a variance of (k+n­2)/(k+n­4) as long as k+n > 4. This means that the variable b* in equation (38) is the mean of the marginal posterior distribution over the model parameters b. Furthermore, the variance of the probability distribution in equation (40) can be shown to be

Equation 42

as long as k+n > 4.

Examination of the expression in equation (38) for the mean of the marginal posterior distribution over the model parameters b reveals a critically important finding. In particular, the mean of the marginal posterior distribution is a weighted average of the mean of the prior distribution b0 and the classical least squares estimate b from equation (5):

Equation 43

This equation shows pavement engineers precisely and unambiguously how to weight their prior state of information versus the new information gained from the C­LTPP program! This result is a linchpin of Bayesian statistical methods, allowing decision makers to balance new evidence as it comes in against existing experience and practice and to know precisely how that balance should be made. Evident from the structure of equation (43) is the insight that as more and more data are gathered, the first weighting term (the weighting of the least squares estimate b) approaches unity and the second weighting term (the weighting of the prior estimate b0) approaches zero. No matter what the specific parameters m, M, and k of the prior distribution in equation (30), the more data that is gathered, the more closely the Bayesian approach approximates the regression approach. However, equation (43) tells us precisely what our estimate of the mean value of the model parameters b* should be as an explicit function of the particular data gathered. There is no caveat or apology for "small sample sizes." There is always an explicit estimate, one that begins with the prior mean b0 and proceeds inexorably toward the regression result b).

More importantly than having a continuously updated estimate of the parameters b* of the structural model, we actually have a probability distribution over the model inputs and results. In particular, the posterior distribution over the model parameters b and the error term variances s2 as defined by equation (35) in combination with the structural model y=f(x,b) implies a probability distribution over pavement deterioration y for any setting x of the independent variables. In particular, the probability distribution {b,s| D,I} in equation (35) in combination with the structural model f(x,b) + e implies a probability distribution over the pavement performance parameter y given the structural model f(x,b). The probability distribution thus derived over pavement performance can be abbreviated

Equation 44

This probability distribution over pavement performance y as a function of the independent variables x (which will be measured under the C­LTPP program) is the fundamental grist from which pavement management decisions are made. In particular, at every point in the evolution of the C­LTPP program, we will have a complete probability distribution over pavement performance y as a function of the observed variables x. This probability distribution

· is explicit.

· is quantitative.

· supports pavement management decisions from the first day of the program to the last.

· balances all prior information and all program information.

· works no matter how large or small the data base assembled by the program.

· approaches the regression situation if the data base ever becomes sufficiently large and comprehensive.

· is easily communicated to pavement decision makers, who will need to know how to balance prior experience against C­LTPP data.

To support traditional, deterministic pavement management models, we can compute the expected value of pavement performance y as a function of the observed variables x. Implicitly, this is the same as delivering the following functional form for the predictive model

Equation 45

based on the mean parameter estimates b* from the posterior distribution to highway agencies. We will discuss the deterministic approach based on the deterministic relationship in equation (45) in some detail in the next section.

Better than relying on deterministic methods is to deliver the entire probability distribution itself, i.e., {y | x}, to a probabilistic semi­Markovian model to support inherently probabilistic decision making. We have discussed this approach at length at the Second North American Conference on Managing Pavements, and we will summarize the most salient elements of our approach two sections hence.

2.3.2 Representing Prior Knowledge Using a Prior Probability Distribution

It is important at this point to discuss the issue of a prior probability distribution head­on. Why should one be burdened having to estimate a prior probability distribution over the structural model parameters b when regression methods seem to require no such input? How can one reliably quantify what is known before the C­LTPP program is initiated? The answer, as we shall demonstrate here, is that regression methods do implicitly use a prior probability distribution of precisely the form in equation (35). Unhappily, the implicit prior probability distribution used in regression methods is completely unrealistic, i.e., it tacitly relies on settings for the parameters m, M, and k in equation (30) that are completely unrealistic. We will show that it is far preferable and far more realistic to assemble reasonable parameter values m, M, and k that characterize prior probability distributions that truly represent available information than it is to rely on the completely unreasonable prior probability distribution inescapably resident in regression analysis. The remainder of this section derives the prior probability distribution that is implicitly resident within the regression approach.

Pursuing the relationship between the regression approach and the Bayesian approach further, it is never correct to assume that the prior state of information is zero. Even in the missile problem in Section 2.2 in which there had been no prior flights, there was nonetheless prior information available. The Bayesian approach was able to use it, but the regression approach systematically ignored it.

Does the regression approach make any tacit assumptions about the prior state of information? The answer is yes. In particular, if the prior distribution over model parameters is assumed to be uniformly distributed from negative infinity to positive infinity, the Bayesian approach gives an identical estimate to the regression approach. Expressed alternatively, the regression approach assumes tacitly that the prior state of knowledge before the C­LTPP program is initiated is zero; nobody knows anything. Nobody has any inkling whatsoever of the mechanism of pavement deterioration or of the parameters that affect it. Yet we know this is not true. Working pavement engineers possess a good deal of first hand knowledge of many aspects of pavement deterioration in their service areas. The experience base of working pavement engineers is far from trivial and should not be ignored. Our Bayesian approach provides an explicit way to begin with that knowledge and experience base and to systematically evolve away from it, i.e., to "learn," as the C­LTPP program delivers better and more controlled data.

The regression approach would ask working pavement engineers to immediately abandon current practice and experience in favor of statistical data from C­LTPP, for it would discount all prior information. Prudent pavement engineers would never discard their long experience in favor of a small amount of data no matter how accurate. Rather, prudent pavement engineers would weigh new data (new evidence) against their pre­existing experience, initially weighting the new data at a very low level but gradually weighting the new data more heavily as more definitive data is collected. The largely intuitive process by which pavement engineers weigh the evidence of new data against their past experience is precisely replicated by the Bayesian statistical method. In fact, the Bayesian statistical procedure as we see below calculates explicit weights for prior experience versus new data and helps pavement engineers understand the appropriate relative weights over time. Equation (38) was precisely such a weighting scheme, telling pavement engineers to begin with their prior estimate b0 and gradually evolve over time to the statistically derived result b), initially weighting the former more heavily but eventually weighting the latter more heavily.

Providing explicit guidance for how pavement decision makers should weight past and future information will greatly enhance the acceptance and impact of the C­LTPP data as it is assembled.

Turning now to several technical issues surrounding the prior distribution in equation (30), we should emphasize that the mathematical form of the prior distribution in equation (30) contains a sufficient number of parameters (k, M, m, and b0) that it can approximate virtually any prior probability distribution one might wish. For example, we can approximate the situation of maximum prior uncertainty (i.e., no idea at all what the structural model parameters b are) using a uniform probability distribution by specifying the parameter M at a very large value (approaching infinity), the parameter m at a very small value (approaching zero from above), and the parameter k at a small value (approaching 1 from above). Such settings of the parameters m, M, and k render the marginal distribution in equation (30) to be uniformly distributed between negative infinity and positive infinity. We term such a prior probability distribution a "diffuse prior." We have already mentioned the fact that the Bayesian statistical procedure using a diffuse prior gives exactly the same estimates of the model parameters b as the regression procedure. The mathematical form in equation (30) is sufficiently general that the regression procedure is achieved as a trivial and unrealistic special case of the Bayesian approach, i.e., vanishingly small m, k slightly above 1, and very large M.

We emphasize that use of the proposed Bayesian procedure does not preclude use of the regression procedure. Indeed, the regression result can be achieved as a special case. However, the Bayesian procedure allows a rich range of additional cases that allow better quantification of prior information and balance between that initial information and C­LTPP information. In particular, the mathematical form in equation (30) allows prior probability distributions that are not uniformly distributed between negative infinity and positive infinity to be used. Settings of the parameters m, k, M, and b0 other than the extreme values mentioned previously cause the prior probability distribution to "cluster" about the mean estimate b0 in a symmetrical fashion much like a normal probability distribution and to have a finite standard deviation. Settings of the parameters m, M, and k control not only the standard deviation but also the "fatness" of the tails of the prior probability distribution.

In our experience, there are enough degrees of freedom in the functional form in equation (30) and the parameters m, M, and k to approximate virtually any symmetrical prior probability distribution. This is a critical insight, for it allows us to encode prior probability distributions subjectively using well established techniques and thereafter to approximate those probability distributions using the mathematical form in equation (30). Such approximation can be accomplished systematically using a technique such as least squared error, graphically, or subjectively. Such approximation is well within the bounds of error resident in the subjective prior probability distribution itself and certainly does not compromise the results. On the contrary, the diffuse prior assumption intrinsically resident in the regression technique is so much worse that virtually any approximate technique is bound to be better.

Probability elicitation techniques motivated initially by Savage (1972) and advanced by Kahneman, Slovic, and Tversky (1981) will be particularly useful in quantifying preexisting information that resides "in the heads" of working pavement engineers as a starting point for the long term pavement performance program. Some might argue that such information elicited subjectively from cognizant, experienced pavement engineers is arbitrary and worthless. We would counter that it is better, indeed far better, than the naive assumption implicit in regression methods that nobody knows anything that is not embedded in C­LTPP data gathered under the program. Bayesian statistical methods give a balance, indeed a proper balance, between prior expert judgment and statistical evidence assembled under this program.

We must reiterate that nobody can force field engineers to adopt the results of long term Pavement performance monitoring. They will continue to do exactly what they want. In order for them to embrace the results of long term pavement performance monitoring, they must understand its relevance and its proper balance relative to the practices upon which their careers are founded. What pavement engineer of twenty or more years tenure would abandon established and accepted practice in favor of a small amount of pavement monitoring data? Who would risk the ire of the traveling public in adopting "radical, new, unproven methods" emerging from embryonic, long term pavement monitoring? Assuredly the answer is none. By contrast, pavement engineers would probably be positively disposed toward balancing their established procedures against initially small but growing, emerging, long term pavement monitoring information that leads to incremental improvements in established practice. Such careful and prudent evolution would decrease, not increase, the risk of using new methods and ensure their ultimate proliferation.

Some would argue that the need for a prior distribution introduces a large measure of subjectivity into an otherwise highly scientific program designed to carefully monitor long term pavement performance. While we would agree, we would counter that it is the subjectively held opinions of pavement engineers that lie at the heart of pavement design and management decisions and that must be modified if long term pavement monitoring is ever to succeed. If the visceral models of pavement deterioration held by decentralized decision makers all over the world are not changed, then long term pavement performance monitoring is fundamentally futile.

In summary, we reiterate several key observations regarding the prior probability distribution as quantified by equation (30):

· It can approximate adequately any prior state of knowledge that exists by appropriate setting of the parameters.

· It is infinitely preferable to the diffuse prior assumption intrinsic in the regression approach, which systematically ignores any and all prior information.

· Its impact on the result will be diminished gradually the more statistical evidence that is assembled during the C­LTPP program. The major effects of the prior occur while the C­LTPP sample size is small.

As the C­LTPP sample size becomes large, the Bayesian and regression results converge. Realistically, just as with the simple missile example, the sample size will never become large relative to number of unknown factors. Small sample size will be a permanent affliction of LTPP. The Bayesian approach will be permanently better than the regression approach.

(Continue)

Return to Table of Contents

Return to Main Page