Goodnessoffit tests for the generalized pareto distribution.

Abstract: Tests of fit are given for the generalized Pareto distribution (GPD) based on Cramer-von Mises statistics. Examples are given to illustrate the estimation techniques and the goodness-of-fit procedures. The tests are applied to the exceedances over given thresholds for 238 river flows in Canada; in general, the GPD provides an adequate fit. The tests are useful in deciding the threshold in such applications; this method is investigated and also the closeness of the GPD to some other distributions that might be used for long-tailed data.

KEY WORDS: Exceedances; Extreme values; Hydrology; Threshold selection.

The generalized Pareto distribution (GPD) has the following distribution function:

F(x) = 1 - [(1 - kx/a).sup.1/k], (1)

where a is a positive scale parameter and k is a shape parameter. The density function is

f(x) = (1/a)[(1 - kx/a).sup.(1-k)/k]; (2)

the range of x is 0 [less than or equal to] x < [infinity] for k [less than or equal to] 0 and 0 [less than or equal to] x [less than or equal to] a/k for k > 0. The mean and variance are [mu] = a/(l + k) and [[sigma].sup.2] = [a.sup.2]/{[(1+k).sup.2](1+2k)}; thus the variance of the GPD is finite only for k > -0.5. For the special values k = 0 and 1, the GPD becomes the exponential and uniform distributions, respectively. The name generalized Pareto was given by Pickands (1975); the distribution is sometimes called simply Pareto when k < 0. In this case, the GPD has a long tail to the right and has been used to model datasets that exhibit this form in several areas of applied

In particular, the GPD is used to model extreme values. This application was discussed by many authors--for example, Hosking and Wallis (1987), Smith (1984, 1989, 1990), Davison (1984), and Davison and Smith (1990). Smith (1990) gave an excellent review of the two most widely used methods in this field, based on generalized extreme value distributions and on the GPD. In hydrology, the GPD is often called the "peaks over thresholds" (POT) model since it is used to model exceedances over threshold levels in flood control. Davison and Smith (1990) discussed this application in their section 9, using river-flow exceedances for a particular river over a period of 35 years. The authors fit the GPD for exceedances over a series of thresholds and also calculated the Kolmogorov-Smirnov and Anderson-Darling statistics to test fit. In the article, they mentioned the lack of tests for the GPD [echoing a remark made earlier by Smith (1984)]. In the absence of such tests, they used tables for testing exponentiality, which, as the authors pointed out, give too high critical values. This dataset is tested for GPD in Section 3.

In this article, goodness-of-fit tests are given for the GPD, based on the Cramer-von Mises statistic [W.sup.2] and the Anderson-Darling [A.sup.2]. We concentrate on the most practical case in which the parameters are not known. Estimation of parameters is discussed in Section 1, and the goodness-of-fit tests are given in Section 2. In Section 3, they are applied to exceedances for a Canadian river, and it is shown how the tests can be used to help select the threshold in the POT model, following ideas suggested by Davison and Smith (1990). Moreover, the Davison-Smith example is revisited, using the GPD test and the Anderson-Darling statistic. The tests are then used to model the exceedances over thresholds of 238 Canadian river-flow series; they indicate the adequacy of the GPD fit. The technique of choosing the threshold is based on a stability property of the GPD; the efficacy of the technique when the true exceedance distribution is not GPD is examined in Section 4. In Section 5, we investigate the versat ility of the GPD as a distribution that might be used for many types of data with a long tail and no mode in the density. Finally, the asymptotic theory of the tests is given in Section 6.

1. ESTIMATION OF PARAMETERS

Before the test of fit can be made, unknown parameters in (1) must first be estimated. The asymptotic theory of the test statistics requires an efficient method of estimation; we shall use maximum likelihood. Although it is theoretically possible to have datasets for which no solution exists to the likelihood equations, in practice this appears to be extremely rare. For example, in our examination of the 238 Canadian rivers, maximum likelihood estimates of the parameters could be found in every case. Thus we shall assume that such estimates exist for the dataset under test. Hosking and Wallis (1987), Castillo and Hadi (1997), and Dupuis and Tsao (1998) studied other methods of estimating the parameters. One of these methods is the use of probability-weighted moments; although there appear to be some advantages in terms of bias, Chen and Balakrishnan (1995) and Dupuis (1996) later showed that this technique is not always feasible.

We now discuss estimation by maximum likelihood. Suppose that [x.sub.1],..., [x.sub.n] is a given random sample from the GPD given in (1), and let [x.sub.(1)] [less than or equal to] [x.sub.(2)] [less than or equal to] ... [less than or equal to] [x.sub.(n)] be the order statistics. We consider three distinct cases--Case 1, in which the shape parameter k is known and the scale parameter a is unknown; Case 2, in which the shape parameter k is unknown, and the scale parameter is known; Case 3, in which both parameters a and k are unknown. Case 3 is the most likely situation to arise in practice. The log-likelihood is given by

L(a, k) = -n log a - (1 - 1/k) x [summation over (i=1] [n] log(1 - [kx.sub.i]/a) for k [not equal to] 0 = - n log a - [summation over (i=1] [n] [x.sub.i]/a for k = 0. (3)

The range for a is a > 0 for k [less than or equal to] 0 and a > [kx.sub.(n)] for k > 0. When k < 1/2, Smith (1984) showed that, under certain regularity conditions, the maximum likelihood estimators are asymptotically normal and asymptotically efficient. When 0.5 [less than or equal to] k < 1, Smith (1984) identified the problem as nonregular, which alters the rate of convergence of the maximum likelihood estimators. For k [greater than or equal to] 1, and as n [right arrow] [infinity], the probability approaches 1 that the likelihood has no local maximum. We now consider Cases 1 to 3 separately.

Case 1 (k known, a unknown). For this case, we have the following result.

Proposition 1. For any known k with k < 1, the maximum likelihood estimate of a exists and is unique.

Proof. For k = 0 (the exponential distribution), the result is well known. Suppose that k [not equal to] 0; then a, the maximum likelihood estimate of a, will be a solution a of [partial]L(a, k)/[partial]a = 0, which may be simplified to [L.sub.1](a) = 0, where [L.sub.1](a) = n - (1 - k) [summation over (i=1/n)] [x.sub.i]/(a - k[x.sub.i]). The value of [[partial].sup.2]L(a, k)/[partial][a.sup.2] at a = a is -(1 - k) [summation over (i=1/n)] [x.sub.i] ([a - k[x.sub.i]).sup.-2]/a < 0, which implies that at a = a, the likelihood function attains its maximum value. Moreover, the function [L.sub.1] (a) is an increasing function on the range of a because [partial][L.sub.1] (a)/[partial]a = (1 - k) [summation over (i=1/n)] [x.sub.i]/ [(a - [kx.sub.i]).sup.2] > 0; the function can take negative and positive values; thus it cuts the a axis exactly at one point. Hence, a is unique.

Case 2 (a known, k unknown). In this situation, there may or may not exist a maximum likelihood estimate for k. To see this, consider the likelihood function L(a, k) given in (3) for -[infinity] < k [less than or equal to] a/[x.sub.(n)]. Since a is known, the likelihood will be regarded as a function L(k) of k only. Set [k.sup.*] = a/[x.sub.(n)]; then

[lim.sub.k[right arrow][infinity]] L(k) = - [infinity], (4)

[lim.sub.k[right arrow][k.sup.*]] L(k) = - [infinity] if [k.sup.*] < 1, (5)

and

[lim.sub.k[right arrow][k.sup.*]] L(k) = + [infinity] if [k.sup.*] > 1. (6)

Note that the value of k is not restricted to be less than or equal to 1. From (4) and (5), it follows that, if [k.sup.*] < 1, there is at least one maximum. For a fixed sample of size n, Pr([k.sup.*] < 1) = 1 - [[1 - [(1 - k).sup.1/k]].sup.n] > 1 - [(3/4).sup.n]. Similarly, if [k.sup.*] > 1, it follows from (4) and (6) that there may or may not exist a local maximum.

Case 3 (both parameters unknown). Maximum likelihood estimation of the parameters a and k when both are unknown was discussed in detail by Grimshaw (1993). This case is similar to Case 2: In principle, maximum likelihood estimates for a and k may not exist. However, as stated previously, this is unlikely in practical applications, especially when the sample size is reasonably large. To find a solution, Davison (1984) pointed out that, by a change of parameters to [theta] = k/a and k = k, the problem is reduced to a unidimensional search; we search for [theta], which gives a local maximum of the profile log-likelihood (the log-likelihood maximized over k). This is

[L.sup.*]([theta]) = - n - [summation over (n/i=1)] log(1 - [theta][x.sub.i]) - n log [ - [(n[[theta].sup.-1] [summation over (n/i=1)] log(1 - [theta][x.sub.i])] (7)

for [theta] < 1/[x.sub.(n)]. Suppose that a local maximum [theta] of (7) can be found; then

k = -([n.sup.-1]) [sumaration over (i=1] .sup.n] log(1 - [theta][x.sub.i]) (8)

and

a = k/[theta]. (9)

2. GOODNESS-OF-FIT TESTS

In this section, the Cramer-von Mises statistic [W.sup.2] and the Anderson-Darling statistic [A.sup.2] are described. The Anderson-Darling statistic is a modification of the Cramer-von Mises statistic giving more weight to observations in the tail of the distribution, which is useful in detecting outliers. The null hypothesis is [H.sub.0]: the random sample [x.sub.1],..., [x.sub.n] comes from Distribution (1).

When parameters a and k are known in (1), the GPD is completely specified; we call this situation Case 0. Then the transformation [z.sub.i] = F([x.sub.i]) produces a z sample that will be uniformly distributed between 0 and 1 under [H.sub.0]. Many tests of the uniform distribution exist, including Cramer-von Mises tests (see Stephens 1986), so we shall not consider this case further. In the more common Cases 1, 2, and 3, when one or both parameters must be estimated, the goodness-of-fit test procedure is as follows:

1. Find the estimates of unknown parameters as described previously, and make the transformation [z.sub.(i)] = F([x.sub.(i)]), for i = 1,..., n, using the estimates where necessary.

2. Calculate statistics [W.sup.2] and [A.sup.2] as follows:

[W.sup.2] = [summation over (i=1/n)] [{[z.sub.(i)] - (2i - 1)/(2n)}.sup.2] + 1/(12n)

and

[A.sup.2] = - n - (1/n) [summation over (i=1/n)] (2i - 1)[log{[z.sub.(i)]} + log{1 - [z.sub.(n+1-i)]}].

Tables 1 and 2 give upper-tail asymptotic percentage points for the statistics [W.sup.2] and [A.sup.2], for values of k between -0.9 and 0.5, and for Cases 1, 2, and 3. In Cases 2 and 3, where k must be estimated, the appropriate table should be entered at k, and in all tables, if k or k is greater than 0.5, the table should be entered at k = 0.5. Critical points for other values of k can be obtained by interpolation; linear interpolation in Table 2 for [A.sup.2], Case 3, gives a maximum error in [alpha] from 0.0011 to 0.0003 as [alpha] moves through the important range R; 0.10 [greater than or equal to] [alpha] [greater than or equal to] 0.005 and is less than 0.003 for the two values S; [alpha] = 0.5 and 0.25; for [W.sup.2] the corresponding figures are 0.0025 to 0.0004 over R and less than 0.007 for S. For the much less important cases 1 and 2, linear interpolation in Table 1 gives maximum error for [A.sup.2], Case 1, from 0.009 to 0.0017 over R, and 0.014 for S; for [W.sup.2], Case 1, the figures are 0.0055 t o 0.0015 over R and 0.008 for S. For Case 2, the maximum errors are smaller by a factor of approximately 10.

The points in these tables were calculated from the asymptotic theory to be presented in Section 6. Points for finite n were generated from Monte Carlo samples, using 10,000 samples for each combination of n and k. The results show that these points converge quickly to the asymptotic points for all three cases, a feature of statistics [W.sup.2] and [A.sup.2] found also in many other applications. The asymptotic points can then be used with good accuracy for, say, n [greater than or equal to] 25. More extensive versions of Tables 1 and 2, and also tables showing the convergence to the asymptotic points, were given by Choulakian and Stephens (2000).

3 ?

In this section we consider two specific examples and then discuss the overall results when the GPD is applied to 238 Canadian rivers.

In the first example, the data are n = 72 exceedances of flood peaks (in [m.sup.3]/s) of the Wheaton River near Carcross in Yukon Territory, Canada. The initial threshold value [tau] is 27.50 (how this was found will be explained later), and the 72 exceedances, for the years 1958 to 1984, rounded to one decimal place, were 1.7, 2.2, 14.4, 1.1, 0.4, 20.6, 5.3, 0.7, 1.9, 13, 12, 9.3, 1.4, 18.7, 8.5, 25.5, 11.6, 14.1, 22.1, 1.1, 2.5, 14.4, 1.7, 37.6, 0.6, 2.2, 39, 0.3, 15, 11, 7.3, 22.9, 1.7, 0.1, 1.1, 0.6, 9, 1.7, 7, 20.1, 0.4, 2.8, 14.1, 9.9, 10.4, 10.7, 30, 3.6, 5.6, 30.8, 13.3, 4.2, 25.5, 3.4, 11.9, 21.5, 27.6, 36.4, 2.7, 64, 1.5, 2.5, 27.4, 1, 27.1, 20.2, 16.8, 5.3, 9.7, 27.5, 2.5, 27. The maximum likelihood estimates of the parameters are k = -0.006 and a = 12.14. We now apply the tests presented in Section 2; the values of the test statistics are [W.sup.2] = 0.2389 (p value < 0.025) and [A.sup.2] = 1.452 (p value < 0.010), so the GPD does not fit the dataset well at the threshold value of 27.50. Next we follow the technique of Davison and Smith (1990), who suggested raising the threshold until the GPD fits the new values of the remaining exceedances. Here the threshold was raised successively by the value of the smallest order statistic, which is then deleted, until the p values of the [A.sup.2] and [W.sup.2] statistics exceeded 10%. This happens when six order statistics are deleted; the threshold value is then 27.50 + 0.60 = 28.10. The GPD now fits the data quite well: Details are given in Table 3. This example shows how the tests may help in choosing a threshold value in the POT model.

In this connection, we next revisit the example given by Davison and Smith (1990). Their tables 5 and 6 give the parameter estimates (Case 3) and the values of the Anderson-Darling statistic for a range of thresholds from 140 down to 70 at intervals of 10 units. The values of the Anderson-Darling statistic were compared with 1.74, the 5% point for a test for exponentiality since no test for GPD was available, and the statistics were not nearly significant by this criterion. Now Table 2 can be used to give the asymptotic 5% critical values [z.sub.5] for [A.sup.2] corresponding to the estimate of the parameter k. The first two and last two threshold results are given in Table 4. Davison and Smith pointed out the sudden increase in the value of [A.sup.2] at threshold level 70; against the exponential point this value is still not significant, but using Table 2, it falls just at the critical 5% level.

The Wheaton River in the previous example is one of 238 Canadian rivers for which we have similar data. We now examine how well the GPD fits the exceedances for the other rivers. First one must decide the threshold level for a given river flow. The first threshold estimate, [tau], was chosen so that the number of exceedances per year could be modeled by a Poisson distribution; see, for instance, Todorovic (1979). This was done by taking [tau] such that, if N[tau] is the number of exceedances, the mean of N[tau] divided by its variance was approximately 1. The Poisson assumption will be tested more rigorously later. After the threshold level was chosen, the maximum likelihood estimates of the parameters k and a were calculated and the [W.sup.2] and [A.sup.2] tests applied. Table 5 gives the frequencies of p values for the 238 rivers. It is clear that the GPD fits quite well; using [W.sup.2], only 9 gave p values less than 0.01, and using [A.sup.2], there were only 15. At the 0.05 level, these figures are 34 using [W.sup.2] and 49 using [A.sup.2]. The results demonstrate also that [A.sup.2] is more sensitive (and therefore more powerful) than [W.sup.2] against possible outliers in the tail, as was suggested in Section 2. More details were given by Choulakian and Stephens (2000). For the 49 "rejections" using [A.sup.2], the threshold was increased, as described in the first example, by deleting the smallest order statistics until the p value became larger than 0.10. Then only 10 of the 49 sets were still rejected as GPD by [A.sup.2]. Finally, for these 10 rejected river flows, the Poisson assumption was tested by the Cramervon Mises tests given by Spinelli and Stephens (1997). Only one dataset rejected the Poisson assumption.

Another result of interest is that 229 of the 238 values of k were between -0.5 and 0.4; this confirms the findings of Hosking and Wallis (1987), who restricted attention to -0.5 < k < 0.5, where the great majority of values fall.

4. POWER OF THE FAILURE-TO-REJECT METHOD

The method of choosing the threshold by using as many exceedances as possible subject to passing a test for GPD can be investigated for power as follows. The data analyst wishes to fit a GPD to the exceedances far enough in the tail. She\he therefore starts with as many as possible, say 100, and tests; if the test fails, she/he omits the smallest values one by one and tests again until the test yields acceptance for the GPD. The efficiency of this procedure can now be investigated when the true distribution is not GPD. Suppose, for instance, the true exceedance distribution is Weibull with parameter 0.75, and suppose, starting with sample size 100, the sample fails the GPD test until K lower-order statistics have been removed; then it passes. Statistic K will vary from sample to sample, but its mean will give a measure of the power of the test. The standard deviation and other statistics from the distribution of K are also of interest. This has been investigated for various alternative distributions for the t rue exceedances; Table 6 provides these results. They are based on 1,000 Monte Carlo samples of initial size n = 100, 50, or 30. Only statistic [A.sup.2] is reported because [A.sup.2] outperforms [W.sup.2] at detecting tail discrepancies. Column 3 gives the initial rejection rate for a GPD test with parameters estimated, at test size 0.05; this is also the power in a conventional study of power against the specified alternative. Subsequent columns give the mean, standard deviation, the three quartiles, and the maximum values for K. Thus for the Weibull(0.75) alternative, 509 of the 1,000 samples of size 100 were rejected as GPD initially; of these rejected samples, the mean number of order statistics to be deleted before acceptance was 8.169 with standard deviation 9.178. The distribution of K is long-tailed; the quartiles are 2, 5, and 11, respectively, but one sample needed 52 order statistics removed to achieve acceptance by the GPD test.

In Table 6, the distributions were chosen to give a reasonable starting rejection rate. Several other distributions could be supposed as alternatives (e.g., the half-normal, half-Cauchy, or gamma distributions), but these can be made very close to the GPD by suitable choice of parameters, so the initial power is small, and very few low-order statistics, if any, need be deleted to achieve GPD acceptance (cf. the numerical results in Sec. 5). In their section 9, Davison and Smith suggested that the exceedances for their example might possibly be truly modeled by a mixture of two populations, and in Table 6 we have included three mixtures of GPD as possible true models. The test statistic distinguishes quite effectively between these mixtures and a single GPD.

For all alternative distributions considered in Table 6, it may be seen that, as n increases, the mean, standard deviation, and maximum of K all increase; the results also show the increase in the power of the test as the sample size increases.

5. VERSATILITY OF THE GPD

As was suggested previously, some distributions often fitted to long-tailed distributions may be brought close to a GPD by suitable choice of k and a. This closeness was investigated, and results are given in Table 7. In this table, for example, the standard half-normal and standard half-Cauchy distributions (respectively, the distribution of [absolute value of X] when X has the standard normal or the standard Cauchy distribution) are compared with the GPD(0.293,1.022) for the first and GPD(-0.710, 1.152) for the second; the standard lognormal is compared with GPD(-0.140, -1.406). The choice of the GPD parameters can be made by equating the first two moments of the GPD and those of the distribution compared, but this does not produce as good a fit as the following maximum likelihood procedure.

A sample of 500 values was taken from the half-normal distribution, and a GPD fit was made, estimating the parameters by maximum likelihood, first solving (7) for [theta], and then obtaining k, a from (8) and (9). This was repeated 100 times, and the average values of the k, a values were used as the parameters in the GPD. Although the matches are clearly not perfect, the error in [alpha], the probability level of a given percentile, is small when one distribution is used instead of another. Similar comparisons are given for several Weibull distributions, and a gamma distribution. The exponential (Weibull with parameter 1) is omitted because it is a special case of the GPD with k = 0. Again the two distributions are quite close for the Weibull parameter greater than 1 (where the Weibull has a mode), but the match is less good for Weibull with parameter less than 1--for example, the Weibull(0.5), or the Weibull(0.75)--where the density rises to infinity at X = 0.

Overall, the results suggest that a GPD could often be used as a model for data with a long tail when neither a mode nor an infinite density is suggested by the nature of the variables or by the data themselves.

6. ASYMPTOTIC THEORY OF THE TESTS

In this section we summarize the asymptotic theory of Cramer--von Mises tests. The calculation of asymptotic distributions of the statistics follows a procedure described, for instance, by Stephens (1976). It is based on the fact that [y.sub.n] (z) = [square root of (term)]n{[F.sub.n](z) - z}, 0 [less than or equal to] z [less than or equal to] 1, where [F.sub.n](z) is the empirical distribution function of the z set, it tends to a Gaussian process y(z) as n [right arrow] [infinity] and the statistics are functionals of this process. The mean of y(z) is 0: We need the covariance function [rho](s, t) = E{y(s)y(t)}, 0 [less than or equal to] s, t [less than or equal to] 1. When all the parameters are known, this covariance is [[rho].sub.0](s, t) =min(s, t) - st. When parameters are estimated, the covariance will depend in general on the true values of the estimated parameters. However, if the method of estimation is efficient, the covariance will not depend on the scale parameter a but will depend on the shape parameter k. We illustrate for Case 3 only. As the sample size n [right arrow] [infinity], the maximum likelihood estimators (a, k) have a bivariate normal distribution with mean (a, k) and variance-covariance matrix [SIGMA], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (10)

When both parameters a and k are estimated, the covariance function of y(z) becomes

[[rho].sub.3](s, t) = [[rho].sub.0](s, t) - {g(s)}'[summation over]g(t), (11)

where s = F(x), and g(s) = ([g.sub.1](s), [g.sub.2](s))' is a vector having coordinates

[g.sub.1] (s) = [partial]F / [partial]a = (1 - s) {1 - [(1 - s).sup.-k]}/(ak)

and

[g.sub.2] (s) = [partial]F / [partial]k = (1 - s)k log(1 - s) - 1 + [(1 - s).sup.-k]/[k.sup.2].

When [SIGMA] and g(s) are inserted into (11), [[rho].sub.3](s, t) will be independent of a. When k [greater than or equal to] 0.5, the maximum likelihood estimates of a and k are superefficient in the sense of Darling (1955), and then the covariance and the resulting asymptotic distributions will be the same as for k = 0.5. Thus, if 0.5 [less than or equal to] k [less than or equal to] 1, the table should be entered at k = 0.5, as described in Section 2. In Case 1 the covariance of y(z) becomes

[[rho].sub.1](s, t) = [[rho].sub.0](s, t) - (1 - 2k)[g.sub.1] (s)[g.sub.1] (t) (12)

and for Case 2 it becomes

[[rho].sub.2](s, t) = [[rho].sub.0](s, t) - (1 - k)(1 - 2k)[g.sub.2] (s)[g.sub.2] (t)/2. (13)

In both these cases, at k = 0.5, the asymptotic covariance becomes [[rho].sub.0](s, t). This is the same as for a test for Case 0, when both a and k are known, and the asymptotic points are the same as for such a test (see Stephens 1986).

The Cramer-von Mises statistic [W.sup.2] is based directly on the process y(z), while [A.sup.2] is based on the process w(z) = y(z)/[{z(1 - z)}.sup.1/2]; asymptotically the distributions of [W.sup.2] and [A.sup.2] are those of [W.sup.2] = [[integral].sup.1.sub.0] [y.sup.2](z)dz and [A.sup.2] = [[integral].sup.1.sub.0] [w.sup.2](z)dz. The asymptotic distributions of both statistics are a sum of weighted independent [[chi square].sub.1] variables; the weights for [W.sup.2] must be found from the eigenvalues of an integral equation with the appropriate [[rho].sub.j](s, t) for Case j as kernel. For [A.sup.2], [[rho].sub.A](s, t), the covariance of the w(z) process, is [[rho].sub.j](s, t)/[{st(1 - s)(1 - t)}.sup.1/2], and this is the kernel of the integral equation. Once the weights are known, the percentage points of the distributions can be calculated by Imhof's method. For details of these procedures, see, for example, Stephens (1976).

ACKNOWLEDGMENTS

We thank the editor and two referees for their many helpful suggestions. This work was supported by the Natural Science and Engineering Council of Canada.

REFERENCES

Castillo, E., and Hadi, A. S. (1997), "Fitting the Generalized Pareto Distribution to Data," Journal of the American Statistical Association, 92, 1609-1620.

Chen, G., and Balakrishnan, N. (1995), "The Infeasibility of Probability Weighted Moments Estimation of Some Generalized Distributions," in Recent Advances in Life-Testing and Reliability, ed. N. Balakrishnan, London: CRC Press, pp. 565-573.

Choulakian, V., and Stephens, M. A. (2000), "Goodness-of-Fit Tests for the Generalized Pareto Distribution," research report, Simon Fraser University, Dept. of Mathematics and Statistics.

Darling, D. (1955), "The Cramer-von Mises Test in the Parametric Case," The Annals of Mathematical Statistics, 26, 1-20.

Davison, A. C. (1984), "Modeling Excesses Over High Thresholds, With an Application," in Statistical Extremes and Applications, ed. J. Tiago de Oliveira, Dordrecht: D. Reidel, pp. 461-482.

Davison, A. C., and Smith, R. L. (1990), "Models for Exceedances Over High Thresholds" (with comments), Journal of the Royal Statistical Society, Set. B, 52, 393-442.

Dupuis, D. J. (1996), "Estimating the Probability of Obtaining Nonfeasible Parameter Estimates of the Generalized Pareto Distribution," Journal of Statistical Computer Simulation, 54, 197-209.

Dupuis, D. J., and Tsao, M. (1998), "A Hybrid Estimator for Generalized Pareto and Extreme-Value Distributions," Communications in Statistics--Theory and Methods, 27, 925-941.

Grimshaw, S. D. (1993), "Computing Maximum Likelihood Estimates for the Generalized Pareto Distribution," Technometrics, 35, 185-191.

Hosking, J. R. M., and Wallis, J. R. (1987), "Parameter and Quantile Estimation for the Generalized Pareto Distribution," Technometrics, 29, 339-349.

Pickands, J. (1975), "Statistical Inference Using Extreme Order Statistics," Time Annals of Statistics, 3, 119-131.

Smith, R. L. (1984), "Threshold Methods for Sample Extremes," in Statistical Extremes and Applications, ed. J. Tiago de Oliveira, Dordrecht: Reidel, pp. 6211-6638.

_____(1989), "Extreme Value Analysis of Environmental Time Series: An Application to Trend Detection in Ground-level Ozone," Statistical Science, 4, 367-393.

_____(1990), "Extreme Value Theory," in Handbook of Applicable Mathematics (Vol. 7), ed. W. Ledermann, Chichester, U.K.: Wiley.

Spinelli, J. J., and Stephens, M. A. (1997), "Test of Fit for the Poisson Distribution," The Canadian Journal of Statistics, 25, 257-268.

Stephens, M. A. (1976), "Asymptotic Results for Goodness-of-Fit Statistics With Unknown Parameters," The Aimals of Statistics, 4, 357-369.

_____(1986), "Tests Based on EDF Statistics," in Goodness-of-fit Techniques, eds. R. B. D'Agostino and M. A. Stephens, New York: Marcel Dekker, pp. 97-122.

Todorovic, P. (1979), "A Probabilistic Approach to Analysis and Prediction of Floods," in Proceedings of the International Statistical Institute, Buenos Aires (Vol. 1), pp. 113-124.

Table 1.

Upper-Tail Asymptotic Percentage Points for [W.sup.2] (normal type) and for [A.sup.2] (bold), Cases 1 and 2; p is Pr([W.sup.2] [greater than or equal to] z), or Pr([A.sup.2] [greater than or equal to] z), Where z is the Table Entry Case 1: k Known, a unknown

…

Case 2: a known, k unknown

…

Table 2. Case 3: Both k and a Unknown: Upper-Tail Asymptotic Percentage Points for [W.sup.2] (normal type) and for [A.sup.2] (bold); p is Pr ([W.sup.2] [greater than or equal to] z), or Pr ([A.sup.2] [greater than or equal to] z), Where z is the Table Entry

…

from: http://www.allbusiness.com/technology/826192-1.html (Page 1--26)