| Return to Main
Page Design of a Long Term Pavement Monitoring System for the Canadian Strategic Highway Research Program
2.3 THE BAYESIAN STATISTICAL
APPROACH TO PAVEMENT MONITORING 2.3.1 Mathematical
Development This section extends the regression discussion of the linear model presented above, applying a Bayesian approach. The Bayesian approach begins in quite the same way as the regression approach, with a postulated model of pavement deterioration and a series of observations. We will retain the same linear model as we used in the previous section, y = bx + e. We will retain the same n observations on y and x discussed in the previous section, which define n equations In implementing the Bayesian approach, we must make an important assumption from regression analysis very explicit. In particular, we must make an explicit probabilistic assumption regarding the nature of the error terms in equations (24)(26) In particular, we postulate a specific mathematical equation, a probability distribution, that quantifies the error terms e1,e2,...,en. We could postulate any probability distribution we wish, but we will postulate that the error terms are governed by independent, identically distributed, normal distributions with mean zero and variance s2: This assumption (commonly referred to
as homoskedastic, nonautocorrelated errors) is not as
benign as it might appear at first blush. We assume that
the error terms are all governed by the exact same normal
probability distribution with the exact same zero
mean and the exact same variance s2. Assuming the n error terms are independently sampled from the same normal distribution, the likelihood function given n observations is the probability that the specific errors e1,e2,...,en will occur. The probability that the specific errors e1,e2,...,en will occur is the probability that precisely these n samples will be selected from the normal distribution. The answer is: where in which R2 is given by
equation (20) and c1 is a normalizing constant
that ensures that equation (28) integrates to unity.
Equation (29) is simply the unbiased estimate of the
variance of the error terms s2 in equation (6)
above from regression methods. We assume that the prior probability distribution over the parameters b of the model and the variance s2 of the random terms in the model is given by the following rather general functional form where c0 is a normalizing
constant such that the integral is unity. The terms k, M,
m, and b0 are parameters we shall use to
quantify our prior state of information. The parameter b0
represents the mean of our prior estimate of the
model parameters b. (The number in the missile example
analogous to b0 is the initially estimated
failure probability of 1/3.) We will discuss how these
parameters will be set below. The prior probability
distribution is a joint probability distribution over
model parameters b and variance s2 and
therefore implicitly represents our total state of
information before any observations are made, i.e.,
before any data is collected by the CLTPP program. In
particular, the prior probability distribution represents
not only what we believe to be true about the pavement
deterioration parameters b but also what we
believe to be true about the random disturbance terms ei. The best summary measure of what we know about pavement deterioration before any long term monitoring is performed is contained in the marginal distribution over the model parameters b consistent with the joint prior distribution in (30). Integrating out the error parameter s in (30), we obtain the following marginal distribution over the parameter b where B(m,n) is the "beta function" The variance of the variable b in equation (31) can be shown to be as long as the variable k is strictly
larger than 4. If we examine the mathematical expression for the probability density function called the tdistribution, we can show from equation (31) that the variable is distributed according to a tdistribution with degrees of freedom k2. Equation (34) reemphasizes that the mean estimate of the model parameters b before any data is gathered is b0, i.e., the mean of the prior probability distribution over the model parameters is b0, Equation (33) indicates that the variance of the prior distribution grows as M grows and decreases as m and k grow. Figure 24 plots the marginal
distribution over the model parameters b for various
values of k. As we shall show later, the parameter k can
be interpreted heuristically as the "size of the
preexisting data base" that underlies the prior
probability distribution. Large values of k imply that
the prior probability distribution is based on large
quantities of data (and thereby less uncertainty), while
small values of k imply less prior data (and thereby more
uncertainty). Notice in the figure, larger values of k
imply tighter prior distributions about the mean b0,
i.e., better prior knowledge. We now combine the prior probability distribution in equation (30) with the likelihood function in equation (28) as dictated by Bayes Theorem. Such application of Bayes Theorem implies the relationship that the posterior joint probability distribution over the model parameters b and the uncertainty s2 in the random terms has the same mathematical form as the prior in (30). The posterior joint distribution over the model parameters b and the error term s2 is where b is as defined in equation (5). The term c1 is a normalizing constant that renders the integral of the joint probability distribution in (35) equal to unity. Notice the direct analogy in form between the prior distribution in equation (30) and the posterior distribution in equation (35). This propertycommonality of form between the prior and posterioris called a conjugate relationship. Conjugate prior and posterior distributions such as (30) and (35) are extremely convenient because they forestall much analytical complexity. It is not particularly useful for our
purposes to analyze the joint probability distribution
over the model parameters b and the variance s2
in the error terms. Rather, the critical result we seek
is the unconditional (marginal) posterior distribution
over the model parameters b. This unconditional
(marginal) posterior distribution over the model
parameters b contains the richest and fullest possible
representation of both the prior information and
the new data gathered during the pavement monitoring
process. The unconditional (marginal) posterior distribution over the model parameters b is given by integrating equation (35) over all possible values of s2. Performing the necessary integration, we can show that the marginal posterior distribution over the model parameters b is where c2 is a normalizing constant ensuring that the expression integrates to unity. If we define the following terms in equation (36), the probability distribution in equation (36) can be written in the form where B(.,.) again represents the beta function. If we examine the mathematical expression for the tdistribution, we can show from equation (40) that the variable is distributed according to a tdistribution with degrees of freedom n+k2. The tdistribution with degrees of freedom n+k2 has a mean of zero and a variance of (k+n2)/(k+n4) as long as k+n > 4. This means that the variable b* in equation (38) is the mean of the marginal posterior distribution over the model parameters b. Furthermore, the variance of the probability distribution in equation (40) can be shown to be as long as k+n > 4. Examination of the expression in equation (38) for the mean of the marginal posterior distribution over the model parameters b reveals a critically important finding. In particular, the mean of the marginal posterior distribution is a weighted average of the mean of the prior distribution b0 and the classical least squares estimate b from equation (5): This equation shows pavement engineers
precisely and unambiguously how to
weight their prior state of information versus the new
information gained from the CLTPP program! This result
is a linchpin of Bayesian statistical methods, allowing
decision makers to balance new evidence as it comes in
against existing experience and practice and to know
precisely how that balance should be made. Evident from
the structure of equation (43) is the insight that as
more and more data are gathered, the first weighting term
(the weighting of the least squares estimate b)
approaches unity and the second weighting term (the
weighting of the prior estimate b0) approaches
zero. No matter what the specific parameters m, M, and k
of the prior distribution in equation (30), the more data
that is gathered, the more closely the Bayesian approach
approximates the regression approach. However, equation
(43) tells us precisely what our estimate of the
mean value of the model parameters b* should be as an
explicit function of the particular data gathered. There
is no caveat or apology for "small sample
sizes." There is always an explicit estimate, one
that begins with the prior mean b0 and
proceeds inexorably toward the regression result b). More importantly than having a continuously updated estimate of the parameters b* of the structural model, we actually have a probability distribution over the model inputs and results. In particular, the posterior distribution over the model parameters b and the error term variances s2 as defined by equation (35) in combination with the structural model y=f(x,b) implies a probability distribution over pavement deterioration y for any setting x of the independent variables. In particular, the probability distribution {b,s| D,I} in equation (35) in combination with the structural model f(x,b) + e implies a probability distribution over the pavement performance parameter y given the structural model f(x,b). The probability distribution thus derived over pavement performance can be abbreviated This probability distribution over
pavement performance y as a function of the independent
variables x (which will be measured under the CLTPP
program) is the fundamental grist from which pavement
management decisions are made. In particular, at every
point in the evolution of the CLTPP program, we will
have a complete probability distribution over pavement
performance y as a function of the observed variables x.
This probability distribution · is explicit. · is quantitative. · supports pavement management decisions from the first day of the program to the last. · balances all prior information and all program information. · works no matter how large or small the data base assembled by the program. · approaches the regression situation if the data base ever becomes sufficiently large and comprehensive. · is easily communicated to pavement
decision makers, who will need to know how to balance
prior experience against CLTPP data. To support traditional, deterministic pavement management models, we can compute the expected value of pavement performance y as a function of the observed variables x. Implicitly, this is the same as delivering the following functional form for the predictive model based on the mean parameter estimates
b* from the posterior distribution to highway agencies.
We will discuss the deterministic approach based on the
deterministic relationship in equation (45) in some
detail in the next section. Better than relying on deterministic
methods is to deliver the entire probability distribution
itself, i.e., {y | x}, to a probabilistic semiMarkovian
model to support inherently probabilistic decision
making. We have discussed this approach at length at the
Second North American Conference on Managing Pavements,
and we will summarize the most salient elements of our
approach two sections hence. 2.3.2 Representing Prior Knowledge
Using a Prior Probability Distribution It is important at this point to
discuss the issue of a prior probability distribution
headon. Why should one be burdened having to estimate a
prior probability distribution over the structural model
parameters b when regression methods seem to require no
such input? How can one reliably quantify what is known
before the CLTPP program is initiated? The answer, as
we shall demonstrate here, is that regression methods do
implicitly use a prior probability distribution of
precisely the form in equation (35). Unhappily, the
implicit prior probability distribution used in
regression methods is completely
unrealistic, i.e., it tacitly relies on
settings for the parameters m, M, and k in equation (30)
that are completely unrealistic. We will show that it is
far preferable and far more realistic to assemble
reasonable parameter values m, M, and k that characterize
prior probability distributions that truly represent
available information than it is to rely on the
completely unreasonable prior probability distribution
inescapably resident in regression analysis. The
remainder of this section derives the prior probability
distribution that is implicitly resident within the
regression approach. Pursuing the relationship between the
regression approach and the Bayesian approach further, it
is never correct to assume that the prior state of
information is zero. Even in the missile problem in
Section 2.2 in which there had been no prior flights,
there was nonetheless prior information available. The
Bayesian approach was able to use it, but the regression
approach systematically ignored it. Does the regression approach make any
tacit assumptions about the prior state of information?
The answer is yes. In particular, if the prior
distribution over model parameters is assumed to be
uniformly distributed from negative infinity to positive
infinity, the Bayesian approach gives an identical
estimate to the regression approach. Expressed
alternatively, the regression approach assumes tacitly
that the prior state of knowledge before the CLTPP
program is initiated is zero; nobody knows
anything. Nobody has any inkling whatsoever of the
mechanism of pavement deterioration or of the parameters
that affect it. Yet we know this is not true. Working
pavement engineers possess a good deal of first hand
knowledge of many aspects of pavement deterioration in
their service areas. The experience base of working
pavement engineers is far from trivial and should not be
ignored. Our Bayesian approach provides an explicit way
to begin with that knowledge and experience base and to
systematically evolve away from it, i.e., to
"learn," as the CLTPP program delivers better
and more controlled data. The regression approach would ask
working pavement engineers to immediately abandon current
practice and experience in favor of statistical data from
CLTPP, for it would discount all prior information.
Prudent pavement engineers would never discard their long
experience in favor of a small amount of data no matter
how accurate. Rather, prudent pavement engineers would
weigh new data (new evidence) against their preexisting
experience, initially weighting the new data at a very
low level but gradually weighting the new data more
heavily as more definitive data is collected. The largely
intuitive process by which pavement engineers weigh the
evidence of new data against their past experience is
precisely replicated by the Bayesian statistical method.
In fact, the Bayesian statistical procedure as we see
below calculates explicit weights for prior experience
versus new data and helps pavement engineers understand
the appropriate relative weights over time. Equation (38)
was precisely such a weighting scheme, telling
pavement engineers to begin with their prior estimate b0
and gradually evolve over time to the statistically
derived result b), initially weighting the former more
heavily but eventually weighting the latter more heavily. Providing explicit guidance for how
pavement decision makers should weight past and future
information will greatly enhance the acceptance and
impact of the CLTPP data as it is assembled. Turning now to several technical issues
surrounding the prior distribution in equation (30), we
should emphasize that the mathematical form of the prior
distribution in equation (30) contains a sufficient
number of parameters (k, M, m, and b0) that it can
approximate virtually any prior probability distribution
one might wish. For example, we can approximate the
situation of maximum prior uncertainty (i.e., no idea at
all what the structural model parameters b are) using a
uniform probability distribution by specifying the
parameter M at a very large value (approaching infinity),
the parameter m at a very small value (approaching zero
from above), and the parameter k at a small value
(approaching 1 from above). Such settings of the
parameters m, M, and k render the marginal distribution
in equation (30) to be uniformly distributed between
negative infinity and positive infinity. We term such a
prior probability distribution a "diffuse
prior." We have already mentioned the fact that the
Bayesian statistical procedure using a diffuse prior
gives exactly the same estimates of the model
parameters b as the regression procedure. The
mathematical form in equation (30) is sufficiently
general that the regression procedure is achieved as a
trivial and unrealistic special case of the Bayesian
approach, i.e., vanishingly small m, k slightly above 1,
and very large M. We emphasize that use of the proposed
Bayesian procedure does not preclude use of the
regression procedure. Indeed, the regression result can
be achieved as a special case. However, the Bayesian
procedure allows a rich range of additional cases that
allow better quantification of prior information and
balance between that initial information and CLTPP
information. In particular, the mathematical form in
equation (30) allows prior probability distributions that
are not uniformly distributed between negative
infinity and positive infinity to be used. Settings of
the parameters m, k, M, and b0 other than the extreme
values mentioned previously cause the prior probability
distribution to "cluster" about the mean
estimate b0 in a symmetrical fashion much like a normal
probability distribution and to have a finite standard
deviation. Settings of the parameters m, M, and k control
not only the standard deviation but also the
"fatness" of the tails of the prior probability
distribution. In our experience, there are enough
degrees of freedom in the functional form in equation
(30) and the parameters m, M, and k to approximate
virtually any symmetrical prior probability distribution.
This is a critical insight, for it allows us to encode
prior probability distributions subjectively using well
established techniques and thereafter to approximate
those probability distributions using the mathematical
form in equation (30). Such approximation can be
accomplished systematically using a technique such as
least squared error, graphically, or subjectively. Such
approximation is well within the bounds of error resident
in the subjective prior probability distribution itself
and certainly does not compromise the results. On the
contrary, the diffuse prior assumption intrinsically
resident in the regression technique is so much worse
that virtually any approximate technique is bound to be
better. Probability elicitation techniques
motivated initially by Savage (1972) and advanced by
Kahneman, Slovic, and Tversky (1981) will be particularly
useful in quantifying preexisting information that
resides "in the heads" of working pavement
engineers as a starting point for the long term pavement
performance program. Some might argue that such
information elicited subjectively from cognizant,
experienced pavement engineers is arbitrary and
worthless. We would counter that it is better, indeed far
better, than the naive assumption implicit in
regression methods that nobody knows anything that is not
embedded in CLTPP data gathered under the program.
Bayesian statistical methods give a balance, indeed a
proper balance, between prior expert judgment and
statistical evidence assembled under this program. We must reiterate that nobody can
force field engineers to adopt the results of long term
Pavement performance monitoring. They will
continue to do exactly what they want. In order for them
to embrace the results of long term pavement performance
monitoring, they must understand its relevance and its
proper balance relative to the practices upon which their
careers are founded. What pavement engineer of twenty or
more years tenure would abandon established and accepted
practice in favor of a small amount of pavement
monitoring data? Who would risk the ire of the traveling
public in adopting "radical, new, unproven
methods" emerging from embryonic, long term pavement
monitoring? Assuredly the answer is none. By contrast,
pavement engineers would probably be positively disposed
toward balancing their established procedures against
initially small but growing, emerging, long term pavement
monitoring information that leads to incremental
improvements in established practice. Such careful and
prudent evolution would decrease, not increase, the risk
of using new methods and ensure their ultimate
proliferation. Some would argue that the need for a
prior distribution introduces a large measure of
subjectivity into an otherwise highly scientific program
designed to carefully monitor long term pavement
performance. While we would agree, we would counter that
it is the subjectively held opinions of pavement
engineers that lie at the heart of pavement design and
management decisions and that must be modified if long
term pavement monitoring is ever to succeed. If the
visceral models of pavement deterioration held by
decentralized decision makers all over the world are not
changed, then long term pavement performance monitoring
is fundamentally futile. In summary, we reiterate several key
observations regarding the prior probability distribution
as quantified by equation (30): · It can approximate adequately any
prior state of knowledge that exists by appropriate
setting of the parameters. · It is infinitely preferable to the
diffuse prior assumption intrinsic in the regression
approach, which systematically ignores any and all prior
information. · Its impact on the result will be diminished gradually the more statistical evidence that is assembled during the CLTPP program. The major effects of the prior occur while the CLTPP sample size is small. As the CLTPP sample size becomes large, the Bayesian and regression results converge. Realistically, just as with the simple missile example, the sample size will never become large relative to number of unknown factors. Small sample size will be a permanent affliction of LTPP. The Bayesian approach will be permanently better than the regression approach. |