Return to Main Page

Design of a Long Term Pavement Monitoring System for the Canadain Strategic Highway Research Program

 

Section 2

TECHNICAL DISCUSSION OF THE PAVEMENT MANAGEMENT PROBLEM

This section offers a more systematic and technical discussion of the problem of long term pavement performance monitoring and its relationship to pavement management and design decisions. Understanding this technical discussion is critical to understanding the desiderata of pavement monitoring, performance prediction, and pavement design and management.

The ultimate objective of the long term pavement monitoring and measurement program is to provide data that allows specification of a functional relationship between a set of pavement performance variables (denoted y and called by statisticians "dependent" variables) and a set of precursor variables (denoted x and called by statisticians "independent" variables), i.e., y = f(x). With regard to the long term pavement performance monitoring program, the independent variables x will include all variables monitored, i.e., those variables that can prospectively affect pavement performance such as weather, traffic, loads, temperature, and maintenance treatments applied. The dependent variables y will include those pavement performance variables that must be predicted, e.g., roughness, ride comfort measures, imposed vehicle operating costs, transverse cracking, longitudinal cracking. The predominant objective of a protracted monitoring, tracking, and measurement activity is to provide a better and better estimate over time as more observations are made of the fundamental pavement performance relationship y = f(x).

Recognizing that perfect, certain, and complete understanding of a complex and confounded process such as pavement deterioration is impossible, we must recognize in the fundamental pavement performance equation an "error term," a random variable that acknowledges the inevitable imperfections in measurement or in the specification of the function f(x). For example, there may be other variables beyond x that influence y, but we may have inadvertently or systematically left them out. Assigning an analytical form to the error term e is a critical element of a statistical procedure, one that will be at the center of our discussion in this section. To explicitly quantify this intrinsic uncertainty, we write y = f(x) + e where e is an error term, i.e., a random term, that embodies what is not known.

It is critically important to note that pavement performance y is a random variable because e is a random variable. In effect, we have defined a model of the pavement deterioration process that contains a systematic structural element f(x) and a random element e such that the combination of the two comprises a probabilistic model of pavement deterioration. This structure recognizes the reality that no amount of pavement monitoring no matter how intensive can eliminate uncertainty. Monitoring can reduce but never eliminate uncertainty. We shall return to this theme repeatedly, the theme that the output of our model of the pavement deterioration process inferred from the long term pavement monitoring program is a probability distribution over pavement performance whose arguments are a list of independent variables (designated x).

The message that no amount of pavement monitoring no matter how intense or comprehensive can ever eliminate uncertainty is not a message that long term pavement performance monitoring is futile. On the contrary, long term pavement performance monitoring can pay tremendous benefits if we recognize that its primary function is to systematically reduce but never eliminate uncertainty and that the primary objective of the users is to make better decisions as uncertainty is reduced.

Operationally, we characterize the pavement deterioration function f(.) as a mathematical function that has as its argument not only the independent variable x but also number of numerical parameters designated collectively b. We will write the pavement deterioration relationship y = f(x,b) + e to emphasize the distinction between independent variables x and parameters b.

The operational aim of the pavement monitoring program is to

· make a series of observations (x1,y1),(x2,y2), ...,(xn,yn) of the independent variables and the corresponding dependent variables. This series of observations will be assembled over time from a number of monitoring sites.

· hypothesize a specific mathematical function f(x,b) that characterizes pavement depreciation.

· use the foregoing series of observations to draw quantitative conclusions about the function f(.),

- the parameters b to be used in the function f(.), and most importantly

- the deterioration process for the pavement itself.

· assess the appropriateness of the postulated function f(.) and the estimated parameters b as a representation of pavement deterioration.

· deliver the postulated function f(.) and the estimated parameters b to pavement management systems for use in decision making.

Once the observations are made, there are two basic approaches to the problem of estimating the "best" coefficients b and determining the "best" deterioration function f(x,b), the classical statistical or regression approach and the Bayesian approach. Preliminary research in the SHRP program in the United States has focused entirely on the "plain vanilla" multiple regression approach. For reasons we will articulate in this report, we believe the plain vanilla multiple regression approach will prove to be unworkable because of the small sample size problems that will beset both Canada and the United States and because it is incapable of generating robust probabilistic output suitable for use by explicit, quantitative pavement management systems. As we will show here, the Bayesian approach is more fruitful and much more capable of ensuring the success of long term pavement monitoring activities because it specifically overcomes the small sample size problem and systematically produces increasingly accurate probabilistic estimates of pavement deterioration. The remainder of this section will present the regression approach to analyzing the observed data and offer the much more utilitarian Bayesian statistical approach that can overcome the intrinsic small sample size and other problems that will systematically beset the long term pavement monitoring program.

2.1 THE CLASSICAL STATISTICAL (REGRESSION) APPROACH TO DEVELOPING PAVEMENT DETERIORATION FUNCTION AND INPUT PARAMETERS

2.1.1 Methodology

The most common approach that has been attempted in long term pavement monitoring and other contexts has been what we will characterize as the traditional classical statistical approach, which we shall term the regression approach. The approach is appealing because of its simplicity and ubiquitous understanding and acceptance, but as we shall see, it cannot realistically meet the political and technical needs of long term pavement performance monitoring and inference. In particular, regression methods cannot realistically deliver interim results, i.e., work in progress, in a form that can support real world decision making. Regression methods are destined to suffer from a potentially mortal "critical path" problem: All statistics must be in and all statistical analysis must be complete before information from the pavement monitoring program can definitively affect real­world pavement decisions. Classical statistical estimation must be complete and the number of underlying observations sufficiently large (and sufficiently long term) so that statistical fitting methods can support real­world decisions.

Classical statistical methods encourage the overly simplified view that all the C­LTPP program needs to do is collect mountains of data under carefully structured conditions, wait the fifteen or more years until definitive life cycle data is in, thereafter perform standard regression analysis to determine what that data implies for pavement deterioration, and thenceforth know with certainty the true pavement deterioration mechanism at work. In a word, "Gather enough 'clean' data so that the statistical methods to be applied some fifteen years hence will be definitive." Such an approach is abstract statistical idealism. Even if funding agencies were willing to bet fifteen years of funding on the premise that future results will be definitive, we believe the small sample size and other statistical problems intrinsic in the LTPP program design will obviate any definitive results.

Rather than "betting the farm" that fifteen years worth of careful data gathering will be funded and will solve the problem, we have instead designed a more pragmatic approach that can provide continually improving results throughout the entire history of the C­LTPP program, beginning today and culminating when the program ultimately terminates. Our approach, which will use precisely the same observations as regression techniques, will allow continuously improving estimates of the pavement deterioration function f(.) and the parameters of that function beginning with the very first observations and continuing indefinitely thereafter.

If we think of long term pavement performance monitoring as "narrowing" the range of uncertainty (as the Bayesian approach does), consider that the early results of the C­LTPP program will narrow the range of uncertainty only to a modest degree and that later results will induce greater and greater narrowing. However, the degree of narrowing even for the first C­LTPP data will be discernible and significant, and our recommended approach will quantify it and deliver it to pavement management systems in a form that is immediately useful for decision making.

The approach we will put forth here extends the regression approach in a straightforward fashion and overcomes the troublesome small sample size and structural difficulties of regression analysis. Indeed, regression results as conceived in the SHRP documentation can be achieved as special cases of the more general Bayesian statistical results we will present and advocate below. The remainder of this section contains an elementary mathematical discussion of the regression method to serve as a backdrop against which to motivate the proposed Bayesian approach.

For simplicity of exposition, we will assume that the postulated pavement deterioration function is linear, i.e., y = f(x,b) can be written y = bx. (No generality will be lost by making this simplifying assumption. We can easily generalize to nonlinear pavement deterioration functions as well as to linear deterioration functions with nonzero intercepts.) The n observations on y and x (these observations are the "product" of the pavement monitoring activities) prescribe a system of equations

Equations 1 to 3

The aim of regression techniques is to calculate the parameter b that in a gestalt sense would most accurately replicate the observed data using the foregoing system of equations.

The most common technique of finding the best parameters b is the method of least squares, which seeks to find the parameter b that minimizes the sum of the squares of the error terms, i.e., that minimizes

Equation 4

Application of the least squares technique will give the following expression for the parameter b:

Equation 5

It is critically important to notice that the estimate of the pavement performance parameters b depends only on observations assembled under the program. There is absolutely no consideration whatsoever of any preexisting information, expertise, experience, knowledge, or practice.

The natural question that arises is: How good an answer is the estimate in equation (5)? How much data does one need to be sure that the answer in equation (5) is accurate?

If one were trying to predict the mechanism of deterioration, how much data is enough? The answer is found by computing an "unbiased" estimate of the potential error, i.e., the variance, of the estimate in equation (5). In order to define the notion of variance, we must make an explicit assumption regarding the nature of the error terms in equations (1)­(3) In particular, regression methods assume that the error terms are independent. identically distributed random variables with mean zero and variance s2. It is not yet necessary to make any further assumptions regarding the particular form of the error probability distributions in order to derive the main regression results.

The first task is to estimate the variance s2 of the error distributions ei from the n observations at hand. Given the assumptions implicit in equations (1)­(3) and the independent, identically distributed nature of the random error terms, what is the best unbiased estimate of the variance in the error terms? As we shall show, the best unbiased estimate is given by the expression

Equation 6

To demonstrate that the estimator in equation (6) is the best unbiased estimate of the variance of the error terms, we begin by constructing the individual terms of the equation. We begin by developing an expression for the term

Equation 7

in equation (6). To do so, we begin by writing

Equation 8

We can develop an expression for the term (b­b) in equation (8) by substituting the definition of the observation yi from equations (1)­(3) into equation (5) and rearranging to the form

Equation 9

Substitution of equation (9) into equation (8) yields the result

Equation 10

If we square the expression in equation (10) and distinguish those terms that contain the expression ei2 from terms that contain only "cross terms" eiei, we obtain the following equation

Equation 11

where

Equation 12

Substituting equation (11) into the postulated estimator in equation (6) for the variance of the error term, we obtain the following expression for the estimated variance in the error term

Equation 13

To show that the expression in equation (13) is an unbiased estimate of the true variance s2 we must compute the expected value of equation (13) with regard to the probability distributions characterizing the error terms. Recalling that the error terms are independent and identically distributed, the expected value of the "cross terms" is zero because the distributions eiei integrate to zero while the expected value of the ei2 terms is s2 Given this insight, the expected value of the estimator in equation (13) is

Equation 14

where the notation E[.] denotes expectation with regard to the error distributions. If we sum the expectation terms in equation (14) over all observations n, we obtain the final result for estimated variance in the pavement performance index

Equation 15

Equation (15) implies that the expression in equation (6) is an unbiased estimate of the variance of the error distributions.

Armed with the estimate in equation (6) of the variance of the error terms, we now turn to the critical question: How much error is there in our estimate of the pavement deterioration parameter b? Do the n observations provide an accurate and reliable estimate of the deterioration parameter b or not? The definition of the variance of the parameter b is the expectation of the square of the expression in equation (9), i.e.,

Equation 16

where again it is understood that the expectation is taken with regard to the independent, identically distributed error terms. Taking such expectation, we obtain the following expression for the variance in the pavement deterioration parameter b

Equation 17

Combining equations (15) and (17) and using the definition in (6) of the estimator that was verified in (15) to be unbiased, we can write

Equation 18

A measure of the "goodness of fit" is immediately evident from equation (18), which represents the variance in the estimate of the parameter b. Notice that the numerator is simply the sum of squared errors, which by construction is always set at its minimum possible value. By contrast, the denominator of the expression contains the term n­1 which grows linearly with the number of observations n and the sum of the xi2 which grows rapidly as the number of observations grows. In short, the denominator of equation (18) increases as the number of observations increases while the numerator remains relatively constant. This implies the critically important insight in regression analysis that the variance in the parameter estimate b declines as the number of observations increases, i.e., as the quantity of data assembled increases.

This makes sense intuitively. Intuitively, the more observations we have, the better fit we will have. The more statistics we have, the more accurate our estimates of the parameter b and the better our assessment of whether the linear model y = bx is descriptive of the pavement deterioration process being estimated. It is obvious even in this highly simplified example why there is strong motivation indeed to assemble mountains of observations and reams of data; the more data, the better the statistical fit and the more definitive our judgment as the correctness of the postulated model.

Before leaving the regression analysis, we will articulate how much of the observed variation in the performance parameter y can be "explained" by variations in the independent variables x and how much of the observed variation in y is left unexplained. To begin, the sum of squared errors can be written

Equation 19

The term in brackets on the right hand side of equation (19) has a critically important interpretation. If the dependent variable y were completely explainable in terms of the independent variable x, the sum of squared errors would be zero. The term in brackets represents that fraction of the total variability in y that is not explainable by variability in the independent variable x. The rightmost term inside the brackets therefore represents that fraction of the total variability in y that is explainable by variability in the independent variable x. It is sometimes termed the R2 value:

Equation 20

When the R2 value is near 0, virtually none of the variability in y is explainable by variability in x. When the R2 value is near 1, virtually all of the variability in y is explainable by variability in x. We say that the explanatory power of the independent variable x is significant for values of the R2 measure at say 0.8 or higher. A deficiency in the value of R2 might reflect the fact that too little data has been gathered, or it might reflect the fact that the pavement deterioration function y=f(x,b) contains too few independent variables, i.e., too few elements of x, to explain the variation in the dependent variable y.

In closing this section, we should reiterate that all the equations thus far implicitly assume that the variable b is a scalar. It is very straightforward to extend the preceding discussion to the case in which the independent variable x is a vector and the parameter b of the pavement deterioration function is also a vector. The results have precisely the same general flavor as the foregoing results, but the notation is a bit more complex and cumbersome. We will not discuss the extension here to vectors of independent variables (called multiple regression) but rather will concentrate on eliminating the difficulties with the regression approach presented thus far.

2.12 Numerical Example

This section introduces an extremely simple prospective model of pavement deterioration, one that will serve well throughout this section to illustrate the classical (regression) approach, indicate its inherent weakness, and show how the much better Bayesian statistical approach overcomes the inherent difficulties.

We consider here the simplest possible pavement deterioration model, one that considers only a single pavement design and in which a pavement performance index (denoted y) is a linear function of time t. The postulated structural model of deterioration will therefore be

Equation 21

Notice there is but one parameter of this model, namely the "slope" parameter b, and our job is to estimate it from a series of pavement observations.

In attempting to estimate the coefficient b of the pavement deterioration function in equation (21), suppose we have through long term pavement monitoring assembled the estimates given in Table 1.

Table 1

Regression analysis [i.e., application of equation (5)] gives the estimate of the parameter b to be ­0.04693. The variance in the estimate of the parameter b is, per equation (18), variance(b) = 0.000174, meaning that the standard deviation of the error distribution for the parameter b (termed the standard error in b) is 0.013211. In lay terms, therefore, the model parameter b is defined to be ­0.04693 +/- 0.013211. The +/­ term represents one standard deviation above and below the mean estimate of the model parameter b.

To understand the significance of the model parameter b and the uncertainty therein, we ask the critical question: How long will it take for the pavement to deteriorate to a pavement performance index level of 0.2? Assuming 0.2 is the minimum serviceability level of a pavement below which it must be replaced, how long will it take for the pavement to deteriorate to the point at which the performance index is 0.2? The answer is easily calculated by substituting the estimate of the model parameter b into equation (21)

Time to 0.2 index = 17.1 years.

Therefore, for the particular pavement design under consideration, we predict it will take 17.1 years for the pavement to deteriorate to a serviceability level (0.2) at which time it will have to be replaced. Figure 2­1 plots the pavement deterioration function at the expected value of the model parameter b, and Figure 2­2 plots the pavement deterioration function at plus and minus one standard deviation.

Figure 2-1

Unfortunately, however, the sample size here is small. There are only five observations from which to infer the parameter b. The estimate of 17.1 years deterioration time is poor, as we shall now illustrate. If the slope differs from the mean estimate of 0.04693 by +/- 0.13211 (the standard deviation), the time until the pavement deteriorates to an index of 0.2 ranges from 13.3 years at the low end to 23.7 years at the high end. This represents approximately a 30 percent prediction error in the time it takes the pavement to deteriorate to an unserviceable level. Expressed alternatively, the pavement may have to be rehabilitated as frequently as every 13.3 years or as infrequently as every 23.7 years. The present value life cycle cost of the former is tremendously higher than that of the latter, so much so that errors of this magnitude cannot be tolerated. Assuredly an estimate this inaccurate will be rejected by pavement engineers in favor of "conventional wisdom."

Figure 2-2

The R2 value associated with the foregoing estimate is, per equation (20), only 0.7593. Such a low R2 value indicates that not enough of the variability in the pavement performance index is explained by variability in the independent variable (time). There are as yet unidentified confounding variables such as weather, traffic, design, and so forth that confound the estimate. Effort to incorporate those additional variables is probably needed. To facilitate comparison with future cases, the classical regression analysis with five observations has yielded the following results:

Results

Before proceeding onward to analyze the foregoing data using Bayesian econometric methods, consider what would happen if we had fifteen (15) data points instead of the five presented. Table 2 contains the first five data points from the previous example and adds ten more observations for a total of fifteen.

Table 2

In this expanded fifteen sample example, the estimated parameters are

Estimated Parameters

Notice how much lower the standard deviation in the model parameter b is than when we had only five observations, largely because of the impact in the denominator of equation (18) of the term n­1.

The beneficial effects of gathering more data are clearly evident in this example. Notice that the range of uncertainty in the time until the pavement performance index reaches 0.2 has fallen from approximately 30 percent (when only five data points are available) to approximately 15 percent (when fifteen data points are available). The increased accuracy in the estimate of the model parameter b is clearly evident from this example. Figure 2­3, the counterpart of Figure 2­1, shows the range of uncertainty given fifteen data points relative to five data points discussed previously.

Figure 2-3

It is worth reemphasizing that the estimate b=­0.04605 does not recognize in any way whatsoever any prior information that might have existed prior to the fifteen observations. Even if 20,000 observations had been made prior to the fifteen observations being analyzed, those prior 20,000 observations would be completely ignored by the regression approach illustrated here. This is one of the major pitfalls of classical regression analysis; it is fundamentally incapable of recognizing any validity in previously existing estimates.

2.1.3 Pitfalls with Regression Analysis

Alas, what do we have if we do not have enough data or enough variables? What if it is extremely expensive and painstakingly slow to assemble sufficiently high quality data? What if it takes fifteen or more years of highly controlled, continuously funded effort? What if there are so many prospective causal variables (independent variables) that they cannot realistically be measured long enough or extensively enough at the given number of sites? What if the data that exist are not good enough? Regression methods offer no answers other than "the findings are not yet statistically significant" or "the variance in the model parameters is too high." Regression methods offer no realistic interim way to quantify whatever uncertainty exists in results based on then­existing data much less transfer a representation of that uncertainty to real­world pavement management decisions. With regression methods, there is no solution until there is a final solution. The method we will outline below explicitly quantifies the uncertainty and shows how to transfer that uncertainty to today's decision makers.

The belief in regression analysis that the answer lies in the data frequently prompts organizations to undertake massive data gathering exercises. "If only we could gather a massive, accurate, comprehensive data base, we could then make statistically reliable parameter estimates and determine correct functional forms to quantify pavement deterioration." Such a perception is fraught with difficulties. Decisions cannot await final data. Data can be misleading and highly devalued relative to initial expectations. Highway departments have a great deal of knowledge today even though they might have limited data. Decisions must be made today based on the best data and/or knowledge available today. The challenge is to infiltrate today's decisions with data as it emerges from the C­LTPP program, gradually increasing the quality of those decisions. Such is the objective of the Bayesian statistical approach we outline and propose below. Neither the Canadian nor the United States LTPP programs can hope to assemble enough variables at every site to assemble statistically significant estimates for all prospective causal variables.

2.2 A BETTER WAY­BAYESIAN STATISTICAL METHODS

This section presents a highly simplified example that contrasts the regression methods summarized above with Bayesian statistical methods. The objective is to present the Bayesian concept in a very simple context, yet a context that is analogous to the pavement monitoring program. The simple context selected here is analogous in the sense that it involves making successive observations of an event and after each observation updating one's estimate of the probability of occurrence of that event. After a large number of observations of the event, we will have the standard regression estimate dictated by the frequencies of a large number of observations. However, after a handful of observations, we will have an improved estimate suitable for use in today's decision making. Again, we submit that no matter how much data is collected it will still in essence be a "handful."

Suppose that a new missile has been produced, and there is no history of operation of the missile. Never has a missile of this type been launched before. It is our job as quality control engineer to determine the probability of success for any given flight. When we accept the assignment, we believe based on engineers' assurances that a missile, randomly selected, will fail with probability 1/3. However, we are not particularly confident of the engineers' prior probability assessment. We believe it to be correct only to within an error of 2/9. That is, the failure probability could be as high as 1/3 + 2/9 = 5/9 or as low as 1/3 ­ 2/9 = 1/9.

To obtain a more accurate estimate of the probability of success, we decide to witness successive launches of the new missile, record whether they are successful or not, and attempt after each launch to derive a better estimate of the probability of failure. Suppose we travel to the firing range and observe that the launch of the very first missile is a failure. After witnessing the first launch to be a failure, how should we revise our estimate of the probability of failure? Regression analysis would argue that the probability of failure should be 100 percent. One launch has been observed, and it was a failure. Classical statisticians would of course caveat that their estimate is plagued by "small sample size" problems and therefore that the estimate cannot be considered reliable or definitive. They would also argue that many more observations will be necessary before they can amass a statistically significant estimate of the probability of failure. Notwithstanding their small sample size caveats, regression methods would predict the probability of failure after witnessing one failure and zero successes to be 100 percent. They would utterly ignore any information other than the single test failure.

By contrast, Bayesian statistics would estimate the probability of failure to be 2/5 after witnessing the single failure, and the variance in this estimate would be estimated at 6/25. Speaking loosely, Bayesian statisticians would believe the probability of failure to be 2/5 plus or minus 6/25, i.e., the probability could be as low as 4/25 or as high as 16/25. Interestingly, the probability of failure has risen from 1/3 (before observing the single failure) to 2/5 after observing the failure. This is quite different from the 100 percent probability of failure estimated by regression methods on the basis of a single observation (subject to the caveat of small sample size). Bayesian statistics provides a systematic method to quantify rather than qualify small sample size difficulties.

Suppose we witness a successive second launch, and it is a failure just as the first launch was. Classical statisticians would continue to estimate the probability of failure at 100 percent, their estimate bolstered by two consecutive failures without a success. Bayesian statisticians would revise their estimate of the probability of success from its starting level of 2/5 upward to 5/11 and would revise the variance of their estimate from its starting level of 6/25 to 30/121. Two successive failures, and the probability of failure estimated by Bayesian methods would still be below 50 percent (i.e., 5/11). As this simple example shows, the Bayesian approach does not summarily discard the initial estimate of the probability of failure (i.e., 1/3). On the contrary, the Bayesian approach places a good deal of credence in the prior estimate of 1/3 probability of failure, adjusting it systematically and gradually as new evidence comes in. Before and after every successive observation, the Bayesian method provides an explicit estimate of the probability of failure and the uncertainty inherent in that estimate. We reiterate that the Bayesian approach continues to quantify the true implications of small samples sizes rather than simply caveating them away.

After a very large number of observations have occurred, the results of the regression and Bayesian methods would be identical. The two methods converge to the same answer as the amount of data becomes large. Implicitly, as a large quantity of data enters the Bayesian adjustment process, the effect of the prior estimate of 1/3 is superseded by the preponderance of new evidence. However, while the amount of new data remains small, only the Bayesian method provides an operationally useful, credible, sensible estimate. We stress use of the word evidence. The data assembled by watching successive missile launches is evidence, i.e., information that changes one's initial state of knowledge but does not necessarily obliterate it. Regression methods completely obliterate any initial state of knowledge; Bayesian statistical methods update it with new evidence as it comes in. We believe this thinking to be consonant with that of prudent pavement managers.

Before relating this simple example to the pavement monitoring problem at hand, it is important to note the Bayesian approach yields a probability estimate suitable for decision making after every missile observation be it a single observation or ten million observations. The Bayesian method provides a probability which embodies the best then­current information before and after every observation. There is no sense that one must gather mountains of data before anything definitive can be said. One need not definitively overcome the "small sample size" problem before presenting real­world results and supporting real­world decision making. It is this ability to generate usable interim results, i.e., to generate meaningful results from whatever long term pavement performance data has been assembled to date, and apply them to immediate decision making that motivates use of the Bayesian approach for the long term pavement monitoring program.

Observing missiles as they succeed or fail and adjusting one's estimate of the failure probability is analogous to monitoring pavement performance and adjusting one's estimate of the deterioration mechanism as such monitoring proceeds. Regression methods would argue that many, many missiles must be observed before anything meaningful can be said regarding the failure probability. By analogy, regression methods would argue that the C­LTPP program must gather mountains of data from myriad sites before statistically significant estimates of pavement deterioration can be obtained. By contrast, the Bayesian statistical approach we advocate in this report allows emerging data from the limited number of sites that comprise the C­LTPP program to affect current knowledge, supplementing and displacing it slowly over time until obsoleting it altogether after enough measurements have been amassed over the next fifteen or more years.

We do not believe it prudent or politically viable to structure the C­LTPP program so that it is compelled to "wait until all the data is in" before the program begins to show palpable benefits by explicitly supporting real­world pavement management decisions. Indeed, the political process has much too high a discount rate to continue to fund monitoring ten or fifteen years into the future in the face of mounting budget pressure and lack of definitive results. Promises that the "golden age" of pavement science will emerge fifteen years hence are all but lost on politicians and citizens clamoring for more serviceable highways today. Imperfect as our understanding of pavement deterioration might be, ongoing efforts such as C­LTPP must deliver interim benefits quickly, efficiently, and reliably to
today's decision makers, and those interim benefits must increase and improve over time. The regression approach does not support this objective, while the Bayesian approach does. We always know the probability that the missile will fail.

It is well to emphasize the major contributions the C­LTPP program can make, even during its early years. In spite of the small sample size problems that will beset the C­LTPP program, most acute in the early years, the LTPP program data is sure to be better than anything else available, and we must strive to incorporate it as quickly as possible in pavement decision making. It is critically important that the monitoring program be designed to support broad and quick impacts on pavement management. The Bayesian methodology we will put forth in the next section allows immediate and ongoing use of the results of the pavement monitoring program and allows C­LTPP program managers to demonstrate and quantify immediate benefits of your efforts.

The remainder of this section contains a simple technical discussion of the Bayesian statistical approach as a basis to understand the framework we will propose later in this report. The essence of the Bayesian approach is disarmingly simple, dating back to Reverend Thomas Bayes' seminal publication in 1763. Bayes Theorem is a statement about conditional probability, a statement we will apply to the random terms introduced in the regression formulation in Section 2.1. Denote

D = observed data (i.e., all measurements collectively from the C­ LTPP program sites)

b = parameters in the pavement deterioration function f(x,b)

I = prior information (i.e., information known today before any monitoring or measurement occurs)

{b|I} = prior probability distribution over the model parameters b. This probability distribution embodies what is known about pavement deterioration (as embodied in the parameters of the pavement deterioration function) before any C­LTPP program data are assembled. There was absolutely no such concept in the regression approach. It assumed implicitly that nothing substantive is known in advance.

{D |b,I} = probability distribution that the model f(x,b) + e using the parameter estimate b will generate the observed data D. This term is often
termed the "likelihood function," emphasizing that it quantifies the probability that the model using the parameters b will generate the observations D.

{b |D,I} = posterior probability distribution over the model parameters b after the data D are observed, i.e., the probability distribution over the parameters b conditional on observing the data D.

{D | I) = unconditional probability distribution over the data D, i.e.,

Equation 22

Expressed succinctly, the "answer" after the data are assembled is the posterior probability distribution {b |D,I}. The information we have before initiating the pavement monitoring program is {b | I). Bayes Theorem implies that the probability distribution over the parameter b after observing the data D (the "answer" we wish to obtain) is proportional to the probability distribution over the parameter b before observing the data D times the likelihood that the model using the parameter b will generate the observed data D. That is, the sought after posterior distribution is the product of the prior distribution times the likelihood function. Expressed mathematically, Bayes theorem is

Equation 23

We shall use this simple result extensively in the next section.

(Continue)

Return to Table of Contents

Return to Main Page