| Return to Main
Page Design of a Long Term Pavement Monitoring System for the Canadain Strategic Highway Research Program
Section 2 TECHNICAL
DISCUSSION OF THE PAVEMENT MANAGEMENT PROBLEM This section offers a more systematic
and technical discussion of the problem of long term
pavement performance monitoring and its relationship to
pavement management and design decisions. Understanding
this technical discussion is critical to understanding
the desiderata of pavement monitoring, performance
prediction, and pavement design and management. The ultimate objective of the long term
pavement monitoring and measurement program is to provide
data that allows specification of a functional
relationship between a set of pavement performance
variables (denoted y and called by statisticians
"dependent" variables) and a set of precursor
variables (denoted x and called by statisticians
"independent" variables), i.e., y = f(x). With
regard to the long term pavement performance monitoring
program, the independent variables x will include all
variables monitored, i.e., those variables that can
prospectively affect pavement performance such as
weather, traffic, loads, temperature, and maintenance
treatments applied. The dependent variables y will
include those pavement performance variables that must be
predicted, e.g., roughness, ride comfort measures,
imposed vehicle operating costs, transverse cracking,
longitudinal cracking. The predominant objective of a
protracted monitoring, tracking, and measurement activity
is to provide a better and better estimate over time as
more observations are made of the fundamental pavement
performance relationship y = f(x). Recognizing that perfect, certain, and
complete understanding of a complex and confounded
process such as pavement deterioration is impossible, we
must recognize in the fundamental pavement performance
equation an "error term," a random variable
that acknowledges the inevitable imperfections in
measurement or in the specification of the function f(x).
For example, there may be other variables beyond x that
influence y, but we may have inadvertently or
systematically left them out. Assigning an analytical
form to the error term e is a critical element of a
statistical procedure, one that will be at the center of
our discussion in this section. To explicitly quantify
this intrinsic uncertainty, we write y = f(x) + e where e
is an error term, i.e., a random term, that embodies what
is not known. It is critically important to note
that pavement performance y is a random
variable because e is a random
variable. In effect, we have defined a model of
the pavement deterioration process that contains a
systematic structural element f(x) and a random element e
such that the combination of the two comprises a
probabilistic model of pavement deterioration. This
structure recognizes the reality that no amount of
pavement monitoring no matter how intensive can eliminate
uncertainty. Monitoring can reduce but never eliminate
uncertainty. We shall return to this theme repeatedly,
the theme that the output of our model of the pavement
deterioration process inferred from the long term
pavement monitoring program is a probability
distribution over pavement performance whose
arguments are a list of independent variables (designated
x). The message that no amount of pavement
monitoring no matter how intense or comprehensive can
ever eliminate uncertainty is not a message that
long term pavement performance monitoring is futile. On
the contrary, long term pavement performance monitoring
can pay tremendous benefits if we recognize that its
primary function is to systematically reduce but never
eliminate uncertainty and that the primary objective of
the users is to make better decisions as uncertainty is
reduced. Operationally, we characterize the
pavement deterioration function f(.) as a mathematical
function that has as its argument not only the
independent variable x but also number of numerical
parameters designated collectively b. We will write the
pavement deterioration relationship y = f(x,b) + e to
emphasize the distinction between independent variables x
and parameters b. The operational aim of the pavement monitoring program is to · make a series of observations
(x1,y1),(x2,y2), ...,(xn,yn) of the independent variables
and the corresponding dependent variables. This series of
observations will be assembled over time from a number of
monitoring sites. · hypothesize a specific mathematical
function f(x,b) that characterizes pavement depreciation. · use the foregoing series of
observations to draw quantitative conclusions about the
function f(.),
· assess the appropriateness of the
postulated function f(.) and the estimated parameters b
as a representation of pavement deterioration. · deliver the postulated function f(.)
and the estimated parameters b to pavement management
systems for use in decision making. Once the observations are made, there
are two basic approaches to the problem of estimating the
"best" coefficients b and determining the
"best" deterioration function f(x,b), the classical
statistical or regression approach and the Bayesian
approach. Preliminary research in the SHRP program in
the United States has focused entirely on the "plain
vanilla" multiple regression approach. For reasons
we will articulate in this report, we believe the plain
vanilla multiple regression approach will prove to be
unworkable because of the small sample size problems that
will beset both Canada and the United States and because
it is incapable of generating robust probabilistic output
suitable for use by explicit, quantitative pavement
management systems. As we will show here, the Bayesian
approach is more fruitful and much more capable of
ensuring the success of long term pavement monitoring
activities because it specifically overcomes the small
sample size problem and systematically produces
increasingly accurate probabilistic estimates of pavement
deterioration. The remainder of this section will present
the regression approach to analyzing the observed data
and offer the much more utilitarian Bayesian statistical
approach that can overcome the intrinsic small sample
size and other problems that will systematically beset
the long term pavement monitoring program. 2.1 THE CLASSICAL STATISTICAL
(REGRESSION) APPROACH TO DEVELOPING PAVEMENT
DETERIORATION FUNCTION AND INPUT PARAMETERS 2.1.1 Methodology The most common approach that has been
attempted in long term pavement monitoring and other
contexts has been what we will characterize as the
traditional classical statistical approach, which we
shall term the regression approach. The approach is
appealing because of its simplicity and ubiquitous
understanding and acceptance, but as we shall see, it
cannot realistically meet the political and technical
needs of long term pavement performance monitoring and
inference. In particular, regression methods cannot
realistically deliver interim results, i.e., work
in progress, in a form that can support real world
decision making. Regression methods are destined to
suffer from a potentially mortal "critical
path" problem: All statistics must be in and all
statistical analysis must be complete before information
from the pavement monitoring program can definitively
affect realworld pavement decisions. Classical
statistical estimation must be complete and the number of
underlying observations sufficiently large (and
sufficiently long term) so that statistical fitting
methods can support realworld decisions. Classical statistical methods encourage
the overly simplified view that all the CLTPP
program needs to do is collect mountains of data under
carefully structured conditions, wait the fifteen
or more years until definitive life cycle data is in,
thereafter perform standard regression analysis to
determine what that data implies for pavement
deterioration, and thenceforth know with certainty the
true pavement deterioration mechanism at work. In a word,
"Gather enough 'clean' data so that the statistical
methods to be applied some fifteen years hence will be
definitive." Such an approach is abstract
statistical idealism. Even if funding agencies were
willing to bet fifteen years of funding on the premise
that future results will be definitive, we believe the
small sample size and other statistical problems
intrinsic in the LTPP program design will obviate any
definitive results. Rather than "betting the
farm" that fifteen years worth of careful data
gathering will be funded and will solve the problem, we
have instead designed a more pragmatic approach that can
provide continually improving results throughout the entire
history of the CLTPP program, beginning today and
culminating when the program ultimately terminates. Our
approach, which will use precisely the same observations
as regression techniques, will allow continuously
improving estimates of the pavement deterioration
function f(.) and the parameters of that function
beginning with the very first observations and continuing
indefinitely thereafter. If we think of long term pavement
performance monitoring as "narrowing" the range
of uncertainty (as the Bayesian approach does), consider
that the early results of the CLTPP program will narrow
the range of uncertainty only to a modest degree and that
later results will induce greater and greater narrowing.
However, the degree of narrowing even for the first
CLTPP data will be discernible and significant, and our
recommended approach will quantify it and deliver it to
pavement management systems in a form that is immediately
useful for decision making. The approach we will put forth here
extends the regression approach in a straightforward
fashion and overcomes the troublesome small sample size
and structural difficulties of regression analysis.
Indeed, regression results as conceived in the SHRP
documentation can be achieved as special cases of the
more general Bayesian statistical results we will present
and advocate below. The remainder of this section
contains an elementary mathematical discussion of the
regression method to serve as a backdrop against which to
motivate the proposed Bayesian approach. For simplicity of exposition, we will assume that the postulated pavement deterioration function is linear, i.e., y = f(x,b) can be written y = bx. (No generality will be lost by making this simplifying assumption. We can easily generalize to nonlinear pavement deterioration functions as well as to linear deterioration functions with nonzero intercepts.) The n observations on y and x (these observations are the "product" of the pavement monitoring activities) prescribe a system of equations The aim of regression techniques is to
calculate the parameter b that in a gestalt sense would
most accurately replicate the observed data using the
foregoing system of equations. The most common technique of finding the best parameters b is the method of least squares, which seeks to find the parameter b that minimizes the sum of the squares of the error terms, i.e., that minimizes Application of the least squares technique will give the following expression for the parameter b: It is critically important to notice
that the estimate of the pavement performance parameters
b depends only on observations assembled under the
program. There is absolutely no consideration
whatsoever of any preexisting information, expertise,
experience, knowledge, or practice. The natural question that arises is:
How good an answer is the estimate in equation (5)? How
much data does one need to be sure that the answer in
equation (5) is accurate? If one were trying to predict the
mechanism of deterioration, how much data is enough? The
answer is found by computing an "unbiased"
estimate of the potential error, i.e., the variance, of
the estimate in equation (5). In order to define the
notion of variance, we must make an explicit assumption
regarding the nature of the error terms in equations
(1)(3) In particular, regression methods assume that
the error terms are independent. identically
distributed random variables with mean zero and
variance s2. It is not yet necessary to make
any further assumptions regarding the particular form of
the error probability distributions in order to derive
the main regression results. The first task is to estimate the variance s2 of the error distributions ei from the n observations at hand. Given the assumptions implicit in equations (1)(3) and the independent, identically distributed nature of the random error terms, what is the best unbiased estimate of the variance in the error terms? As we shall show, the best unbiased estimate is given by the expression To demonstrate that the estimator in equation (6) is the best unbiased estimate of the variance of the error terms, we begin by constructing the individual terms of the equation. We begin by developing an expression for the term in equation (6). To do so, we begin by writing We can develop an expression for the term (bb) in equation (8) by substituting the definition of the observation yi from equations (1)(3) into equation (5) and rearranging to the form Substitution of equation (9) into equation (8) yields the result If we square the expression in equation (10) and distinguish those terms that contain the expression ei2 from terms that contain only "cross terms" eiei, we obtain the following equation where Substituting equation (11) into the postulated estimator in equation (6) for the variance of the error term, we obtain the following expression for the estimated variance in the error term To show that the expression in equation (13) is an unbiased estimate of the true variance s2 we must compute the expected value of equation (13) with regard to the probability distributions characterizing the error terms. Recalling that the error terms are independent and identically distributed, the expected value of the "cross terms" is zero because the distributions eiei integrate to zero while the expected value of the ei2 terms is s2 Given this insight, the expected value of the estimator in equation (13) is where the notation E[.] denotes expectation with regard to the error distributions. If we sum the expectation terms in equation (14) over all observations n, we obtain the final result for estimated variance in the pavement performance index Equation (15) implies that the
expression in equation (6) is an unbiased estimate of the
variance of the error distributions. Armed with the estimate in equation (6) of the variance of the error terms, we now turn to the critical question: How much error is there in our estimate of the pavement deterioration parameter b? Do the n observations provide an accurate and reliable estimate of the deterioration parameter b or not? The definition of the variance of the parameter b is the expectation of the square of the expression in equation (9), i.e., where again it is understood that the expectation is taken with regard to the independent, identically distributed error terms. Taking such expectation, we obtain the following expression for the variance in the pavement deterioration parameter b Combining equations (15) and (17) and using the definition in (6) of the estimator that was verified in (15) to be unbiased, we can write A measure of the "goodness of
fit" is immediately evident from equation (18),
which represents the variance in the estimate of the
parameter b. Notice that the numerator is simply the sum
of squared errors, which by construction is always set at
its minimum possible value. By contrast, the denominator
of the expression contains the term n1 which grows
linearly with the number of observations n and the sum of
the xi2 which grows rapidly as the number of observations
grows. In short, the denominator of equation (18)
increases as the number of observations increases while
the numerator remains relatively constant. This implies
the critically important insight in regression analysis
that the variance in the parameter estimate b declines as
the number of observations increases, i.e., as the
quantity of data assembled increases. This makes sense intuitively.
Intuitively, the more observations we have, the better
fit we will have. The more statistics we have, the more
accurate our estimates of the parameter b and the better
our assessment of whether the linear model y = bx is
descriptive of the pavement deterioration process being
estimated. It is obvious even in this highly simplified
example why there is strong motivation indeed to assemble
mountains of observations and reams of data; the more
data, the better the statistical fit and the more
definitive our judgment as the correctness of the
postulated model. Before leaving the regression analysis, we will articulate how much of the observed variation in the performance parameter y can be "explained" by variations in the independent variables x and how much of the observed variation in y is left unexplained. To begin, the sum of squared errors can be written The term in brackets on the right hand side of equation (19) has a critically important interpretation. If the dependent variable y were completely explainable in terms of the independent variable x, the sum of squared errors would be zero. The term in brackets represents that fraction of the total variability in y that is not explainable by variability in the independent variable x. The rightmost term inside the brackets therefore represents that fraction of the total variability in y that is explainable by variability in the independent variable x. It is sometimes termed the R2 value: When the R2 value is near 0,
virtually none of the variability in y is explainable by
variability in x. When the R2 value is near 1,
virtually all of the variability in y is explainable by
variability in x. We say that the explanatory power of
the independent variable x is significant for values of
the R2 measure at say 0.8 or higher. A
deficiency in the value of R2 might reflect
the fact that too little data has been gathered, or it
might reflect the fact that the pavement deterioration
function y=f(x,b) contains too few independent variables,
i.e., too few elements of x, to explain the variation in
the dependent variable y. In closing this section, we should
reiterate that all the equations thus far implicitly
assume that the variable b is a scalar. It is very
straightforward to extend the preceding discussion to the
case in which the independent variable x is a vector and
the parameter b of the pavement deterioration function is
also a vector. The results have precisely the same
general flavor as the foregoing results, but the notation
is a bit more complex and cumbersome. We will not discuss
the extension here to vectors of independent variables
(called multiple regression) but rather will concentrate
on eliminating the difficulties with the regression
approach presented thus far. 2.12 Numerical Example This section introduces an extremely
simple prospective model of pavement deterioration, one
that will serve well throughout this section to
illustrate the classical (regression) approach, indicate
its inherent weakness, and show how the much better
Bayesian statistical approach overcomes the inherent
difficulties. We consider here the simplest possible pavement deterioration model, one that considers only a single pavement design and in which a pavement performance index (denoted y) is a linear function of time t. The postulated structural model of deterioration will therefore be Notice there is but one parameter of
this model, namely the "slope" parameter b, and
our job is to estimate it from a series of pavement
observations. In attempting to estimate the coefficient b of the pavement deterioration function in equation (21), suppose we have through long term pavement monitoring assembled the estimates given in Table 1. Regression analysis [i.e., application
of equation (5)] gives the estimate of the parameter b to
be 0.04693. The variance in the estimate of the
parameter b is, per equation (18), variance(b) =
0.000174, meaning that the standard deviation of the
error distribution for the parameter b (termed the
standard error in b) is 0.013211. In lay terms,
therefore, the model parameter b is defined to be
0.04693 +/- 0.013211. The +/ term represents one
standard deviation above and below the mean estimate of
the model parameter b. To understand the significance of the
model parameter b and the uncertainty therein, we ask the
critical question: How long will it take for the pavement
to deteriorate to a pavement performance index level of
0.2? Assuming 0.2 is the minimum serviceability level of
a pavement below which it must be replaced, how long will
it take for the pavement to deteriorate to the point at
which the performance index is 0.2? The answer is easily
calculated by substituting the estimate of the model
parameter b into equation (21) Time to 0.2 index = 17.1 years. Therefore, for the particular pavement design under consideration, we predict it will take 17.1 years for the pavement to deteriorate to a serviceability level (0.2) at which time it will have to be replaced. Figure 21 plots the pavement deterioration function at the expected value of the model parameter b, and Figure 22 plots the pavement deterioration function at plus and minus one standard deviation. Unfortunately, however, the sample size here is small. There are only five observations from which to infer the parameter b. The estimate of 17.1 years deterioration time is poor, as we shall now illustrate. If the slope differs from the mean estimate of 0.04693 by +/- 0.13211 (the standard deviation), the time until the pavement deteriorates to an index of 0.2 ranges from 13.3 years at the low end to 23.7 years at the high end. This represents approximately a 30 percent prediction error in the time it takes the pavement to deteriorate to an unserviceable level. Expressed alternatively, the pavement may have to be rehabilitated as frequently as every 13.3 years or as infrequently as every 23.7 years. The present value life cycle cost of the former is tremendously higher than that of the latter, so much so that errors of this magnitude cannot be tolerated. Assuredly an estimate this inaccurate will be rejected by pavement engineers in favor of "conventional wisdom." The R2 value associated with the foregoing estimate is, per equation (20), only 0.7593. Such a low R2 value indicates that not enough of the variability in the pavement performance index is explained by variability in the independent variable (time). There are as yet unidentified confounding variables such as weather, traffic, design, and so forth that confound the estimate. Effort to incorporate those additional variables is probably needed. To facilitate comparison with future cases, the classical regression analysis with five observations has yielded the following results: Before proceeding onward to analyze the foregoing data using Bayesian econometric methods, consider what would happen if we had fifteen (15) data points instead of the five presented. Table 2 contains the first five data points from the previous example and adds ten more observations for a total of fifteen. In this expanded fifteen sample example, the estimated parameters are Notice how much lower the standard
deviation in the model parameter b is than when we had
only five observations, largely because of the impact in
the denominator of equation (18) of the term n1. The beneficial effects of gathering more data are clearly evident in this example. Notice that the range of uncertainty in the time until the pavement performance index reaches 0.2 has fallen from approximately 30 percent (when only five data points are available) to approximately 15 percent (when fifteen data points are available). The increased accuracy in the estimate of the model parameter b is clearly evident from this example. Figure 23, the counterpart of Figure 21, shows the range of uncertainty given fifteen data points relative to five data points discussed previously. It is worth reemphasizing that the
estimate b=0.04605 does not recognize in any way
whatsoever any prior information that might have
existed prior to the fifteen observations. Even if 20,000
observations had been made prior to the fifteen
observations being analyzed, those prior 20,000
observations would be completely ignored by the
regression approach illustrated here. This is one of the
major pitfalls of classical regression analysis; it is
fundamentally incapable of recognizing any validity in
previously existing estimates. 2.1.3 Pitfalls with Regression Analysis Alas, what do we have if we do not have
enough data or enough variables? What if it is extremely
expensive and painstakingly slow to assemble sufficiently
high quality data? What if it takes fifteen or more years
of highly controlled, continuously funded effort? What if
there are so many prospective causal variables
(independent variables) that they cannot realistically be
measured long enough or extensively enough at the given
number of sites? What if the data that exist are not good
enough? Regression methods offer no answers other than
"the findings are not yet statistically
significant" or "the variance in the model
parameters is too high." Regression methods offer no
realistic interim way to quantify whatever uncertainty
exists in results based on thenexisting data much less
transfer a representation of that uncertainty to
realworld pavement management decisions. With
regression methods, there is no solution until
there is a final solution. The method we will outline
below explicitly quantifies the uncertainty and shows how
to transfer that uncertainty to today's decision makers. The belief in regression analysis that
the answer lies in the data frequently prompts
organizations to undertake massive data gathering
exercises. "If only we could gather a massive,
accurate, comprehensive data base, we could then make
statistically reliable parameter estimates and determine
correct functional forms to quantify pavement deterioration."
Such a perception is fraught with difficulties. Decisions
cannot await final data. Data can be misleading and
highly devalued relative to initial expectations. Highway
departments have a great deal of knowledge today even
though they might have limited data. Decisions must be
made today based on the best data and/or knowledge
available today. The challenge is to infiltrate today's
decisions with data as it emerges from the CLTPP
program, gradually increasing the quality of those
decisions. Such is the objective of the Bayesian
statistical approach we outline and propose below.
Neither the Canadian nor the United States LTPP programs
can hope to assemble enough variables at every site to
assemble statistically significant estimates for all
prospective causal variables. 2.2 A BETTER WAYBAYESIAN STATISTICAL METHODS This section presents a highly
simplified example that contrasts the regression methods
summarized above with Bayesian statistical methods. The
objective is to present the Bayesian concept in a very
simple context, yet a context that is analogous to the
pavement monitoring program. The simple context selected
here is analogous in the sense that it involves making
successive observations of an event and after each
observation updating one's estimate of the probability of
occurrence of that event. After a large number of
observations of the event, we will have the standard
regression estimate dictated by the frequencies of a
large number of observations. However, after a handful of
observations, we will have an improved estimate suitable
for use in today's decision making. Again, we submit that
no matter how much data is collected it will still in
essence be a "handful." Suppose that a new missile has been
produced, and there is no history of operation of the
missile. Never has a missile of this type been launched
before. It is our job as quality control engineer to
determine the probability of success for any given
flight. When we accept the assignment, we believe based
on engineers' assurances that a missile, randomly
selected, will fail with probability 1/3. However, we are
not particularly confident of the engineers' prior
probability assessment. We believe it to be correct only
to within an error of 2/9. That is, the failure
probability could be as high as 1/3 + 2/9 = 5/9 or as low
as 1/3 2/9 = 1/9. To obtain a more accurate estimate of
the probability of success, we decide to witness
successive launches of the new missile, record whether
they are successful or not, and attempt after each launch
to derive a better estimate of the probability of
failure. Suppose we travel to the firing range and
observe that the launch of the very first missile is a
failure. After witnessing the first launch to be a
failure, how should we revise our estimate of the
probability of failure? Regression analysis would argue
that the probability of failure should be 100 percent.
One launch has been observed, and it was a failure.
Classical statisticians would of course caveat that their
estimate is plagued by "small sample size"
problems and therefore that the estimate cannot be
considered reliable or definitive. They would also argue
that many more observations will be necessary before they
can amass a statistically significant estimate of the
probability of failure. Notwithstanding their small
sample size caveats, regression methods would predict
the probability of failure after witnessing one failure
and zero successes to be 100 percent. They would utterly
ignore any information other than the single test
failure. By contrast, Bayesian statistics would
estimate the probability of failure to be 2/5 after
witnessing the single failure, and the variance in this
estimate would be estimated at 6/25. Speaking loosely,
Bayesian statisticians would believe the probability of
failure to be 2/5 plus or minus 6/25, i.e., the
probability could be as low as 4/25 or as high as 16/25.
Interestingly, the probability of failure has risen from
1/3 (before observing the single failure) to 2/5 after
observing the failure. This is quite different from the
100 percent probability of failure estimated by
regression methods on the basis of a single observation
(subject to the caveat of small sample size). Bayesian
statistics provides a systematic method to quantify rather
than qualify small sample size difficulties. Suppose we witness a successive second
launch, and it is a failure just as the first launch was.
Classical statisticians would continue to estimate the
probability of failure at 100 percent, their estimate
bolstered by two consecutive failures without a success.
Bayesian statisticians would revise their estimate of the
probability of success from its starting level of 2/5
upward to 5/11 and would revise the variance of their
estimate from its starting level of 6/25 to 30/121. Two
successive failures, and the probability of failure
estimated by Bayesian methods would still be below 50
percent (i.e., 5/11). As this simple example shows, the
Bayesian approach does not summarily discard the initial
estimate of the probability of failure (i.e., 1/3). On
the contrary, the Bayesian approach places a good deal of
credence in the prior estimate of 1/3 probability of
failure, adjusting it systematically and gradually as new
evidence comes in. Before and after every successive
observation, the Bayesian method provides an explicit
estimate of the probability of failure and the
uncertainty inherent in that estimate. We reiterate that
the Bayesian approach continues to quantify the true
implications of small samples sizes rather than simply
caveating them away. After a very large number of
observations have occurred, the results of the regression
and Bayesian methods would be identical. The two methods
converge to the same answer as the amount of data becomes
large. Implicitly, as a large quantity of data enters the
Bayesian adjustment process, the effect of the prior
estimate of 1/3 is superseded by the preponderance of new
evidence. However, while the amount of new data remains
small, only the Bayesian method provides an operationally
useful, credible, sensible estimate. We stress use of the
word evidence. The data assembled by watching
successive missile launches is evidence, i.e.,
information that changes one's initial state of knowledge
but does not necessarily obliterate it. Regression
methods completely obliterate any initial state of
knowledge; Bayesian statistical methods update it with
new evidence as it comes in. We believe this thinking to
be consonant with that of prudent pavement managers. Before relating this simple example to
the pavement monitoring problem at hand, it is important
to note the Bayesian approach yields a probability
estimate suitable for decision making after every missile
observation be it a single observation or ten million
observations. The Bayesian method provides a probability
which embodies the best thencurrent information before
and after every observation. There is no sense
that one must gather mountains of data before anything
definitive can be said. One need not definitively
overcome the "small sample size" problem before
presenting realworld results and supporting realworld
decision making. It is this ability to generate usable
interim results, i.e., to generate meaningful results
from whatever long term pavement performance data has
been assembled to date, and apply them to immediate
decision making that motivates use of the Bayesian
approach for the long term pavement monitoring program. Observing missiles as they succeed or
fail and adjusting one's estimate of the failure
probability is analogous to monitoring pavement
performance and adjusting one's estimate of the
deterioration mechanism as such monitoring proceeds.
Regression methods would argue that many, many missiles
must be observed before anything meaningful can be said
regarding the failure probability. By analogy, regression
methods would argue that the CLTPP program must gather
mountains of data from myriad sites before statistically
significant estimates of pavement deterioration can be
obtained. By contrast, the Bayesian statistical approach
we advocate in this report allows emerging data from the
limited number of sites that comprise the CLTPP program
to affect current knowledge, supplementing and displacing
it slowly over time until obsoleting it altogether after
enough measurements have been amassed over the next
fifteen or more years. We do not believe it prudent or
politically viable to structure the CLTPP program so
that it is compelled to "wait until all the data is
in" before the program begins to show palpable
benefits by explicitly supporting realworld pavement
management decisions. Indeed, the political process has
much too high a discount rate to continue to fund
monitoring ten or fifteen years into the future in the
face of mounting budget pressure and lack of definitive
results. Promises that the "golden age" of
pavement science will emerge fifteen years hence are all
but lost on politicians and citizens clamoring for more
serviceable highways today. Imperfect as our
understanding of pavement deterioration might be, ongoing
efforts such as CLTPP must deliver interim benefits
quickly, efficiently, and reliably to It is well to emphasize the major
contributions the CLTPP program can make, even during
its early years. In spite of the small sample size
problems that will beset the CLTPP program, most acute
in the early years, the LTPP program data is sure to
be better than anything else available, and we must
strive to incorporate it as quickly as possible in
pavement decision making. It is critically important that
the monitoring program be designed to support
broad and quick impacts on pavement management. The
Bayesian methodology we will put forth in the next
section allows immediate and ongoing use of
the results of the pavement monitoring program and allows
CLTPP program managers to demonstrate and quantify
immediate benefits of your efforts. The remainder of this section contains
a simple technical discussion of the Bayesian statistical
approach as a basis to understand the framework we will
propose later in this report. The essence of the Bayesian
approach is disarmingly simple, dating back to Reverend
Thomas Bayes' seminal publication in 1763. Bayes Theorem
is a statement about conditional probability, a statement
we will apply to the random terms introduced in the
regression formulation in Section 2.1. Denote D = observed data (i.e., all
measurements collectively from the C LTPP program
sites) b = parameters in the pavement deterioration function f(x,b) I = prior information (i.e.,
information known today before any monitoring or
measurement occurs) {b|I} = prior probability distribution
over the model parameters b. This probability
distribution embodies what is known about pavement
deterioration (as embodied in the parameters of the
pavement deterioration function) before any CLTPP
program data are assembled. There was absolutely no such
concept in the regression approach. It assumed implicitly
that nothing substantive is known in advance. {D |b,I} = probability distribution
that the model f(x,b) + e using the parameter estimate b
will generate the observed data D. This term is often {b |D,I} = posterior probability
distribution over the model parameters b after the
data D are observed, i.e., the probability distribution
over the parameters b conditional on observing the data
D. {D | I) = unconditional probability distribution over the data D, i.e., Expressed succinctly, the "answer" after the data are assembled is the posterior probability distribution {b |D,I}. The information we have before initiating the pavement monitoring program is {b | I). Bayes Theorem implies that the probability distribution over the parameter b after observing the data D (the "answer" we wish to obtain) is proportional to the probability distribution over the parameter b before observing the data D times the likelihood that the model using the parameter b will generate the observed data D. That is, the sought after posterior distribution is the product of the prior distribution times the likelihood function. Expressed mathematically, Bayes theorem is We shall use this simple result extensively in the next section. |