174-200

Statistical Theory of Extremes

Statistical Analysis With Partial Information

José Tiago de Fonseca Oliveira 1

1.Academia das Ciências de Lisboa (Lisbon Academy of Sciences), Lisbon, Portugal.

23-06-2017
28-12-2016
28-12-2016
28-12-2016

Graphical Abstract

Highlights

Abstract

The chapter deals with some sparse results in which the samples are not complete. The analysis concerned with drawing inferences about quantities associated with right tail of the real F(.) when n  is large, using asymptotic extreme value theory. The corresponding quantiles for the lower tail can be dealt with by the usual transformation from maxima to minima. Largest observations and subsamples maxima are compared. Subsample maxima method shows definite superiority. Estimation using two or three quantiles, estimations of the parameters using block partitions of the samples and estimations using exceedances over thresholds are also discussed.

Keywords

Extreme observations , Lower tail , Subsample method , Largest observations , Subsample maxima , Bock partitions

1 . Introduction

So far we have dealt with complete samples which were supposed to have Weibull, Gumbel or Fréchet distributions, or, otherwise, that one of them could be accepted as a good approximation to the distribution of the underlying population.

Now we will deal with some sparse results in which the samples are not complete, such as following:

  1. We know only the upper (lower) part of a sample, i.e., the upper (lower) \(m\) observations, and we want to estimate the upper (lower) tail of the distribution; see Weismann(1978) and (1984) and references therein;
  2. We know that the sample of maxima can be split into \(m\) subsamples and we use only the \(\mathit{k }\) maxima of each subsample; see Hüsler and Tiago de Oliveira (1988) for \(k=1\);
  3. We use two or three quantiles to estimate the parameters; see Tiago de Oliveira (1972) and Kubat and Epstein (1980);
  4. We split the sample into disjoint parts, compute averages, and estimate the parameters; see Kubat (1982), Kubat and Epstein (1980)Hüsler and Schupbach (1986), and Neves (1988);
  5. We use excesses over threshold in the sequence of observations; see Smith (1984), Rösbjerg and Knudsen (1984).

2 . The use of the extreme m observations

Let \(\mathrm{ x_{1}^{"} \geq x_{2}^{"} \geq … \geq x_{n}^{"} }\)   be the decreasing order statistics in a sample of size \(\mathrm{ n }\) from a distribution function \(\mathrm{ F }\) and suppose that \(\mathrm{ F \in} ~D \mathrm{( \tilde{L} ) , }\)  i.e., that 

\(\mathrm{ F^{n} ( \lambda _{n}+ \delta _{n}~x ) \rightarrow \tilde{L} ( x )~ ( n \rightarrow \infty ) }\)  

for some constants \(\mathrm{ \lambda _{n} }\) and \(\mathrm{\delta _{n}>0. }\) We are concerned here with drawing inferences about quantities associated with the right tail of the real \(\mathrm{F ( . ) }\) when \(\mathrm{n }\) is large, using asymptotic extreme value theory.

We are thus interested in

\(\mathrm{\bar{w}=sup \{ x \vert F ( x ) <1 \} }\), the right-end point;

\(\mathrm{F ( x) }\) for large \(\mathrm{x ( <\bar{w} ) }\) ;

\(\mathrm{ \chi _{p}=Q ( p) =F^{-1} ( p ) }\)  with \(\mathrm{ p=1-c/n }\)  ( \(\mathrm{c }\)  constant)

and, hence, in \(\mathrm{ \lambda _{n} }\) and \(\mathrm{\delta _{n}>0. }\) 

Similarly we can be interested in the corresponding quantiles for the lower tail which can be dealt with by the usual transformation from maxima to minima.

The estimation method used up to now, and developed at length in Gumbel (1958), is to divide the original (unordered) data into subsamples. The maxima (or minima) of the subsamples are then used to estimate the parameters of \(\mathrm{ \tilde{L} }\), one of the three extreme value limit distributions. This method is natural when some period exists in the data, as in environmental series, and the natural subsample is the data period. This method is sometimes called the yearly data or (natural) subsamples method.

But another approach is possible. Instead of splitting all observations into \(n |m\) samples of \(m\) observation each and taking the maximum of each sample, we can take the \(m\) largest observations of the sample of \(n\) and decide with them. Note that the underlying hypotheses are different: in the (natural) subsamples method we only suppose that maxima have the same extreme value distribution, while in the largest maxima method we suppose that all observations are i.i.d. with the distribution \(\mathrm{ F ( . ) }\), attracted by one of the extreme value distributions; this will be discussed in the Annex 4.

It will be seen in the next section that if the sample is i.i.d. with a Gumbel distribution then subsample method is, in general, more efficient than the method of the largest \(m\) maxima using the same number of observations (\(m\) maxima of \(n |m\) subsamples or \(m\) largest maxima, both of a sample of size \(n\)).

Let us then describe the Weissman method.

Define \(\mathrm{ x_{ni}^{"}= ( {x_{i}^{"}- \lambda _{n} ) }/{ \delta _{n}}. }\) Then, as \(\mathrm{ F^{n} ( \lambda _{n}+ \delta _{n}~x ) \rightarrow \tilde{L} ( x ) ~as~n \rightarrow \infty }\) , the vector   \(\mathrm{ ( x_{ni}^{"}:i=1,...,n ) ^{T} }\) converges in distribution to \(\mathrm{ ( M_{i}= \tilde{\lambda} ^{-1} ( Z_{1}+…+Z_{i} ) :i=1,…,n ) ^{T} }\), where \(\mathrm{ \tilde{\lambda} ( x ) =-log~\tilde{L} ( x ) }\) and the \(\mathrm{ \{ Z_{i} \} }\) is a sequence of i.i.d. exponential random variables all with mean value 1 (David, 1981). Hence for large \(n\) , the \(m\) largest order statistics, up to a linear transformation, are distributed approximately as \(\mathrm{( M_{1,}…,M_{m} ) }\). There are only three possible \(\mathrm{\tilde{\lambda} ( x ) }\) to consider:

Gumbel: \(\mathrm{\tilde{\lambda} ( x ) =e^{-x} }\) ,

Fréchet: \(\mathrm{\tilde{\lambda} ( x ) =x^{- \alpha } ( x>0 ), ( \alpha >0 ) , }\)

Weibull: \(\mathrm{\tilde{\lambda } ( x ) = ( -x ) ^{ \alpha } ( x<0 ) , ( \alpha >0 ) }\).

Then for large values of  \(\mathrm{y }\) , if  \(\mathrm{F^{n} ( \lambda _{n}+ \delta _{n}~y ) \rightarrow \tilde{L} ( y) ~as~n \rightarrow \infty }\) , we have  

\(\mathrm{F ( y ) \approx \tilde{L}^{{1}/{n}} ( ( y- \lambda _{n} ) / \delta _{n} ) =e^{- \tilde{\lambda} ( ( y- \lambda _{n} ) / \delta _{n} ) /n} }\) ;

estimation of \(\mathrm{F ( y ) }\) is thus obtained by substitution of \(\mathrm{ (\hat{ \lambda} _{n}, \hat{\delta} _{n}) }\) for \(\mathrm{( \lambda _{n}, \delta _{n} ) }\).  

The problem is now to estimate \(\mathrm{( \lambda _{n}, \delta _{n} ) }\) from the \(m\) largest observations.

In the first case (Gumbel distribution), \(\mathrm{ \tilde{ \lambda} ^{-1} ( y ) =-log~y }\) and thus we have that

\(\mathrm{ D_{i}=M_{i}-M_{i+1},-log~\frac{Z_{1}+\dotsb+Z_{i}}{Z_{1}+ \dotsb+Z_{i}+Z_{i+1}}}\) , and  \(\mathrm{ \frac{~Z_{i}}{i} }\)   

have the same distribution.

The spacings \(\mathrm{D_i }\) are thus independent exponential random variables with mean values \(\mathrm{1/i }\) , independent of \(\mathrm{ M_{m} ( i=1,\dotsc,m-1 ) }\). In the other cases the ratios \(\mathrm{ M_{i}/M_{i+1} }\) are independent. Independent and exponential spacings among order statistics are a well known characterization of the exponential distribution. The latter is an important member of the domain of attraction of the Gumbel distribution.

One can plot \(\mathrm{ x_{i}^{"}~vs.~M\{ { M_{i}\} } ( i=1,\dotsc,m ) }\) for each of the three models and decide which (if any !) fits the data, i.e., that they are approximately in a straight line. If \(m\)  is at our disposal it can be determined as the largest value for which \(\mathrm{ x_{1}^{"}>\dotsc>x_{m}^{"} }\) fit the model. Once the model is chosen the parameters can be estimated. Maximum likelihood estimators and minimum variance estimators are discussed in Weissman (1978). For the Gumbel model the maximum likelihood estimators are

\(\mathrm{ \hat{ \delta} _{n}=\frac{1}{m} \sum _{i=1}^{m-1} ( x_{i}^{"}-x_{m}^{"} ) =\frac{1}{m} \sum _{i=1}^{m}x_{i}^{"}-x_{m}^{"}=\bar{x}_{m}^{"}-x_{m}^{"}=\frac{1}{m} \sum _{i=1}^{m-1}i ( x_{i}^{"}-x_{i+1}^{"} ) ) }\)

\(\mathrm{ \hat{ \lambda} _{n}=x_{m}^{"}+ \hat{\delta} _{n}log\,m }\) ,

and the minimum variance estimators are

\(\mathrm{ \delta _{n}^{*}=\frac{1}{m-1} \sum _{i=1}^{m-1} ( x_{i}^{"}-x_{m}^{"} ) =\frac{m}{m-1} \hat{\delta} _{n} }\)

\(\mathrm{ \lambda _{n}^{*}=x_{m}^{"}+ \delta _{n}^{*} ( S_{n}- \gamma ) }\)

where \(\mathrm{ S_{1}=0, }\) \(\mathrm{ S_{m}=1+\frac{1}{2}+\dotsb+\frac{1}{m-1} ( m>1 ) }\) and \(\mathrm{ \gamma =.57722 }\) is Euler’s constant. Then \(\mathrm{ 2 \left( m-1 \right) \delta _{n}^{*}/ \delta _{n} }\) is (asymptotically) a \(\mathrm{ \chi ^{2} \left( 2m-2 \right) }\) variate and thus confidence intervals for \(\mathrm{ \delta _{n} }\) can be obtained. The distribution of \(\mathrm{ U_{m}= ( x_{m}^{"}- \lambda _{n} ) / \delta _{n}^{*} }\) is parameter-free and thus confidence intervals for \(\mathrm{ \lambda _{n} }\) can be obtained provided the percentage points are available. The latter are tabulated in Weissman (1978).

For the Gumbel distribution we have, as seen in the attraction conditions Chapter,

\(\mathrm{ ( Q ( 1-c/n ) - \lambda _{n} ) / \delta _{n} \rightarrow -log~c }\)   as  \(\mathrm{ n \rightarrow \infty. }\)  

 and so

\(\mathrm{ \hat{Q} ( 1-c/n ) = \hat{\lambda} _{n}+ \hat{\delta} _{n} ( -log~c ) }\)

is the approximate maximum likelihood estimator for \(\mathrm{ Q ( 1-c/n) . }\)  

Similar estimates are obtained in Weissman (1981) for threshold (type 1) censoring. Particularly, only values larger than some \(a\) (fixed) are observable and then  \(m\)  is random. It turns out that \(a\) plays the role of \(\mathrm{ x_{m}^{"} }\) in the formulae for \(\mathrm{ \hat{ \lambda} _{n}, \hat{\delta} _{n} }\) and \(\mathrm{ \hat{Q} }\) .

In principle, these asymptotic results hold for every distribution function \(\mathrm{ F ( . ) \in }\,D \mathrm{( \Lambda ( . ) ) }\) (exponential, gamma, normal, lognormal, logistic, Weibull, to name some). But for finite \(n\), better approximations can be obtained for distributions with exponential tail. Recently Boos (1983) used \(\mathrm{ \hat{Q}(1-c/n) }\)  to estimate large quantiles for various distributions. He found that for some of them, as regards the mean-square error, this method is better for some distributions but worse for others; for more details see Weissman (1984).

For the Fréchet distribution the transformation \(\mathrm{ Y_{i}^{"}=log ( x_{i}^{"}- \lambda ) }\) transforms the model to the Gumbel one. For the Weibull distribution the transformation \(\mathrm{ Y_{i}^{"}=-log ( \lambda -x_{i}^{"} ) }\) has the same effect. But in both cases the location parameter, unknown in general, appears and has to be estimated. An example follows.

The three-parameter case for the Weibull distribution for minima is found in life testing, strength distributions, fatigue failures, etc., the available data being

\(\mathrm{ x_{1}^{"} \leq x_{2}^{"} \leq \dotsb \leq x_{m}^{"} }\)  

With \(m<<n \) .

Thus we assume that \(\mathrm{F ( x ) =1-exp⁡ \{ - ( ( x- \lambda) / \delta ) ^{ \alpha } }\) for \(\mathrm{x> \lambda ,F ( x ) =0 }\) if \(\mathrm{x \leq \lambda }\) ; for this distribution we have \(\mathrm{\lambda _{n}=~ \lambda }\) and \(\mathrm{\delta _{n}= \delta ~n^{-1/ \alpha } }\). The maximum likelihood estimators for the three-parameters \(\mathrm{ (\lambda , \delta , \alpha ) }\) are

\(\mathrm{\hat{\delta} = ( \frac{n}{m} ) ^{1/ \hat{\alpha} } ( x’_{m}-\hat{ \lambda} ) , }\)

\(\mathrm{\hat{\alpha} =m/ \sum _{i=1}^{m}log\frac{x’_{m}-\hat{ \lambda} }{x’_{i}- \hat{\lambda} } ( >0 ) , }\)  and

\(\mathrm{ \frac{1}{m} \sum _{i=1}^{m}log\frac{x’_{m}- \hat{\lambda} }{x’_{i}- \hat{\lambda} }+\frac{m}{ \sum _{i=1}^{m} ( x’_{m}- \hat{\lambda} ) / ( x’_{i}- \hat{\lambda} ) }=1. }\)

The last equation shows that \(\mathrm{ 1/ \hat{\alpha }<1 }\) or \(\mathrm{ \hat{\alpha }>1 }\)  and so the method does not work for \(\mathrm{ \alpha \leq 1 }\). We cannot use \(\mathrm{ x’_{1} }\) as an estimator of \(\mathrm{ \lambda }\) because \(\mathrm{ x’_{1}\,\mathrm{{\stackrel{\mathrm{a.s}}{\rightarrow}} \,\lambda }}\) and so the two other equations would give \(\mathrm{ \hat{\alpha} =0 }\) and \(\mathrm{ \hat{\delta} =+ \infty. }\)

Maximum likelihood estimation is discussed by Hall (1982) and Smith and Weissman (1985).

Confidence intervals for \(\mathrm{ \lambda }\) are suggested by Weissman (1981). Using the convergence in distribution of \(\mathrm{ ( M_{1},M_{2},\dotsc,M_{m} ) ^{T} }\) conveniently adapted for minima we see that

\(\mathrm{ \frac{x’_{1}- \lambda }{x’_{m}-x’_{1}} }\)

converge in distribution to a random variable whose distribution function is

\(\mathrm{ 1- ( 1- ( \frac{x}{1+x} ) ^{ \alpha } ) ^{m-1} }\)

for \(\mathrm{ x \geq 0 }\) and \(\mathrm{ 0 }\) for \(\mathrm{ x \leq 0. }\)  

If \(\mathrm{ \alpha }\) is known then the quantile \(\mathrm{ \chi _{m, \alpha } ( p ) }\) of this distribution can be obtained in a closed form. For large values of \(n\mathrm{ , ( x’_{1}- \lambda ) / t( x’_{m}- \lambda ) }\) can be used as a pivotal function to form confidence intervals for \(\mathrm{ \lambda }\). If  \(\mathrm{ \alpha }\) is unknown we can use the pivotal functions

\(\mathrm{ log~\frac{x’_{m}- \lambda }{x’_{1}- \lambda }/ \sum _{i=1}^{m-1}log~\frac{x’_{m}- \lambda }{x’_{1}- \lambda } ( m \geq 3 ) }\)

or

\(\mathrm{log~\frac{x’_{i}- \lambda }{x’_{1}- \lambda }/log\frac{x’_{m}- \lambda }{x’_{1}- \lambda } ( 1<i<m \leq n ) }\)

to obtain confidence intervals for \(\mathrm{ \lambda }\). Under Weibull distributions for minima both have limiting distributions which do not depend on any parameter. The quantiles of these distributions are tabulated in Weissman (1981). Simulation has shown good performance when  \(\mathrm{ \alpha \leq 1 }\) , which is useful since in this case \(\mathrm{( \alpha < 1) }\) the maximum likelihood method fails.

Gomes (1981) investigated the maximum likelihood estimators of \(\mathrm{( \lambda , \delta ) }\)  using the \(m\) largest observations in each of \(n\) samples from \(\mathrm{ \Lambda \left( x \right) }\) Weissman (1978) treated the case \(n=1\) ). That is, for \(m=2\), she considered the pairs \(\mathrm{ x_{1}^{"} \geq x_{2}^{"} }\). The results agree with those of Tiago de Oliveira (1972) and others when \(m=1\). She then investigated the properties of these estimators for \(m=1,2,3\) and \(m=5 \left( 5 \right) 30 \) , by a Monte Carlo simulation based on between 4500 and 10000 replicates for each case, and found that \(\mathrm{\hat{ \lambda} }\) is positively biased for \(m=1\) and negatively biased for \(m>1\) and \(\mathrm{\hat{ \delta} }\) is always negatively biased. In addition, she gave a similar treatment for estimating \(\mathrm{( \lambda , \delta ) }\)  using moment estimators, simple best linear unbiased estimators and simple best linear invariant estimators. For \(m=3\), a simple linear estimator is one of the form

\(\mathrm{\sum _{i=1}^{m}~ \sum _{k=1}^{3}w_{kj}~~x_{kj}^{"} }\)

where \(\mathrm{x_{kj}^{"} }\) is the k-th largest order statistics in the j-th sample. The moment estimator \(\mathrm{ \lambda ^{*} }\) is negatively biased for \(\mathrm{ m=1 }\) and positively biased for \(\mathrm{ m>1 }\) and \(\mathrm{ \delta ^{*} }\) is always positively biased. The bias properties of maximum likelihood and of moment estimators are the reverse of one another. The method of moments gave a slightly better estimator \(\mathrm{ \lambda ^{*} }\) for small values of \(\mathrm{ n }\), but yields very poor estimators \(\mathrm{ \delta ^{*} }\).

Other references are Smith (1986) and Gomes (1984).

Finally some reference must be made to Hill’s (1975) estimator for Fréchet distributions, \(\mathrm{ \Phi _{ \alpha } ( x/ \delta ) (i.e., \lambda =0) }\) and correspondingly to an analogous estimator for the Weibull distribution \(\mathrm{ W_{ \alpha } ( x/ \delta ) }\) (also \(\mathrm{ \lambda =0 }\)) .

For the Fréchet distribution \(\mathrm{ \Phi _{ \alpha } ( x/ \delta ) }\) the transformed random variable \(\mathrm{Y=log~X }\) follows the Gumbel distribution \(\mathrm{\Lambda ( \frac{y-log~ \delta }{{1}/{ \alpha }}) }\). The \(m\) largest values \(\mathrm{ x_{i}^{"}( i=1,2,\dotsc,m ) }\) correspond to the \(m\) largest values \(\mathrm{ y_{i}^{"}=log~x_{i}^{"} }\). Using the ML estimators based on the largest \(m\) observations, of Weissman (1978), we have

\(\mathrm{ 1/ \hat{\alpha} =\frac{1}{m} \sum _{1}^{m}y_{i}^{"}-y_{m}^{"} }\)

\(\mathrm{ log~ \hat{\delta} =y_{m}^{"}+ ( 1/\hat{ \alpha} ) ~log~m }\)

or equivalently      \(\mathrm{ \hat{\alpha }= ( log\frac{( \begin{array}{c} \mathrm{m} \\ \mathrm{\pi} \\ \mathrm{1}\end{array}~x_{i}^{"} ) ^{1/m}}{x_{m}^{"}} ) ^{-1} }\) .

\(\mathrm{ \hat{\delta }=m^{1/ \hat{\alpha} }\cdot x_{m}^{"} }\) .

\(\mathrm{ \hat{\alpha }}\) is the well-known Hill estimator of the index of Fréchet distribution. See also Diaz (1985).

For the Weibull distribution \(\mathrm{ W_{ \alpha } ( x/ \delta ) }\) the transformed random variable \(\mathrm{ Y=-log~X }\) (a decreasing transformation) has the Gumbel distribution \(\mathrm{\Lambda ( \frac{y+log~ \delta }{{1}/{ \alpha }}) }\). Then the smallest  \(m\) observations  \(\mathrm{ x_{1}^{’} \leq x_{2}^{’} \leq \dotsb \leq x_{m}^{’} }\) give rise to the \(m\)  largest values \(\mathrm{ y_{i}^{"}=-log~x_{i}^{’} }\). In the same way we have

\(\mathrm{ ( 1/ \hat{\alpha} ) =\frac{1}{m} \sum _{1}^{m}y_{i}^{"}-y_{m}^{"} }\)

\(\mathrm{( -log~ \hat{\delta} ) =y_{m}^{"}+\hat{( 1/ \alpha ) }log~m }\)

or equivalently    \(\mathrm{\hat{\alpha} = [ log⁡ ( x_{m}^{’}/ ( \begin{array}{c} \mathrm{m} \\ \mathrm{\pi} \\ \mathrm{1}\end{array}x_{i}^{"} ) ^{1/m} ) ] ^{-1} }\)  

\(\mathrm{ \hat{\delta} =m^{-1/ \hat{\alpha} }~x_{m}^{"}. }\)

This result can be useful for right-censored life tests (where naturally \(\mathrm{ \lambda=0 }\) ). 

Evidently quantile estimators and predictors can be made as usual.

For more detail connected to tail estimation see the Annex 4.

3 . Largest observations vs. m subsamples maxima methods; a comparison

In this section, corresponding to item 2 of the Introduction, we will closely follow Hüsler and Tiago de Oliveira (1988). Suppose that we have a sample of \(n=km\) observations and we can use only the largest  \(m\) of all observations or the \(m\)  maxima of natural blocks of data — in fact we are comparing what engineers call, respectively, the largest observations method with the block (yearly) maxima method. Let us denote the \(n=km\) random variables by \(\mathrm{x_{ij},i=1,\dotsc,k }\) and \(\mathrm{j=1,\dotsc,m }\) , where \(m\) denotes the number of possible blocks (e.g. years) and \(k\) the number of observed random variables per block (year). Let \(\mathrm{ y_{j}=max \{ x_{ij},1 \leq i \leq k,~j=1,\dotsc,m \} }\) and assume that all \(\mathrm{ x_{ij} }\) have the Gumbel distribution \(\mathrm{ \Lambda ( \frac{x- \lambda }{ \delta } ) }\) and so the block (yearly) distribution of the \(\mathrm{ y_{j} }\) is \(\mathrm{ \Lambda ( \frac{x- \lambda }{ \delta }-log~k ) = \Lambda ( \frac{x- ( \lambda + \delta ~logk ) }{ \delta } ) ; \lambda + \delta \,log\,k }\) is thus the block (year) location parameter (*).

We may use the \(\mathrm{ m}\) values \(\mathrm{ y_{j} }\) for estimating the values \(\mathrm{ \lambda }\) and \(\mathrm{ \delta}\) , which is the classical method, described e.g. in Gumbel (1958). As said previously, Weissman (1978)  proposed an estimation based on the \(m\) -largest observatiobns all  \(n=km\) values \(\mathrm{ x_{ij} }\), which are

\(\mathrm{ x_{1}^{"}>x_{2}^{"}>…~>x_{m}^{"} }\)

with probability one.                                                                                                                    

At this point let us analyze the efficiency of Weissman’s method with respect to the classical one (**).

We are now assuming, for the sake of comparison, that both methods can be used in a given application, i.e. that the   \(\mathrm{ x_{i}^{"},i \leq m }\) , as well as \(\mathrm{ y_{j},j=1,\dotsc,m }\), can be observed and \(n\) is sufficiently large to the use of the Gumbel approximation. The comparison uses asymptotic results for both methods.

Both methods lose much information with respect to the (underlying) full sample of \(~n=km~; \) the efficiencies of both methods with respect to the full sample tend to zero as \(k~ \rightarrow \infty~ \) since the procedures are based on very special subsamples.

Then the parameters \(\mathrm{ ( \lambda + \delta ~log~k, \delta ) }\) of the \(\mathrm{ y_{j},(j=1,\dotsc,m) }\) have the usual maximum likelihood estimators \(\mathrm{ ( \hat{\lambda} + \hat{ \delta} ~log~k, \hat{\delta } ) }\).  

From the asymptotic \(\left( m \rightarrow \infty \right) \) variance-covariance matrix of \(\mathrm{ ( \hat{\lambda} + \hat{ \delta} ~log~k, \hat{\delta } ) }\), given in Chapter 5, we get for the asymptotic variance-covariance matrix \(\mathrm{\hat{\Sigma } }\)  of \(\mathrm{ ( \hat{\lambda} , \hat{\delta } ) }\),  

\(\mathrm{\hat{\Sigma }=\frac{ \delta ^{2}}{m}\mathrm{ \begin{bmatrix} 1+\frac{6}{\pi^2}(1-\gamma-log\,k)^2 &\mathrm{ -6(1-\gamma-log\,k)/\pi^2 } \\[0.3em] \mathrm{ -6(1-\gamma-log\,k)/\pi^2 } & \mathrm{{6}/{\pi^2} } \\[0.3em]\end {bmatrix}}}\)

and, for \(\mathrm{ k>1 }\),

\(\mathrm{ \rho ( \hat{\lambda} , \hat{\delta} ) =- ( 1+\frac{ \pi ^{2}}{6 ( 1- \gamma -log\,k ) ^{2}} ) ^{-{1}/{2}} \rightarrow -1,~as~k \rightarrow + \infty }\),

but slowly.

Consider now estimation based on the \(m\) largest values \(\mathrm{ x_{1}^{"}>x_{2}^{"}>\dotso>x_{m}^{"} }\)  of the i.i.d. sample of \(\mathrm{ \{ x_{ij} \} ( i=1,\dotsc,k;\,j=1,\dotsc,m ) }\).

For fixed \(\mathrm{ k }\) and \(\mathrm{ m }\) we get the usual maximum likelihood estimators \(\mathrm{ ( \lambda , \delta) }\)  (with a slight change of notations) from the censored sample of \(\mathrm{ m }\) largest observations of a (possible) sample of \(\mathrm{ n=km }\) . We have, as \(\mathrm{ k \rightarrow + \infty }\) ,

\(\mathrm{ \lambda ^{*}=x_{m}^{"}- \delta ^{*}log~k+O_{p} ( k^{-1} ) }\)

\(\mathrm{\delta ^{*}=\bar{x}_{m}^{"}-x_{m}^{"}+O_{p} ( \frac{log\,k}{k} ) }\)

with      \(\mathrm{\bar{x}_{m}^{"}=\frac{1}{m} \sum _{1}^{m}x_{i}^{"} }\) . 

The asymptotic variance-covariance matrix of \(\mathrm{( \lambda ^{*}, \delta ^{*}) }\) is

\(\mathrm{\sum ^{*} \sim \delta ^{2}\mathrm{ \begin{bmatrix} \sigma_{m}^2\frac{m-1}{m^2}(log\,m)^2 &\mathrm{ \frac{m-1}{m^2}log\,m } \\[0.3em] \mathrm{ \frac{m-1}{m^2}log\,m } & \mathrm{ \frac{m-1}{m^2} } \\[0.3em] \end{bmatrix}}}\)

with

\(\mathrm{ \rho ( \lambda ^{*}, \delta ^{*} ) = ( 1+m^{2} \sigma _{m}^{2}/ ( m-1 ) ~ ( log\,m ) ^{2} ) ^{-1/2} }\)         for  \(\mathrm{m \geq 2 }\)

where  \(\mathrm{\sigma _{m}^{2}=\frac{ \pi ^{2}}{6}-S_{m}^{’} }\),  with  \(\mathrm{ S_{m}^{’}= \sum _{1}^{m-1}j^{-2},S_{m}^{’}=0 }\)

Clearly \(\mathrm{ \rho ( \lambda ^{*}, \delta ^{*} ) \rightarrow 1 }\), as \(\mathrm{k \rightarrow \infty }\) , also slowly. To compare these two procedures we will use the well-known Cramér efficiency. It is defined as

\(\mathrm{eff( ( \lambda ^{*}, \delta ^{*} ) / ( \hat{\lambda} ,\hat{ \delta} ) ) =det ( \hat{ \Sigma} ) /det ( \Sigma ^{*} ) }\)

\(\mathrm{=\frac{6}{ \pi ^{2}}\frac{1}{ \left( m-1 \right)\, \sigma _{m}^{2}} \downarrow \frac{6}{ \pi ^{2}}=.60793 }\)

as \(\mathrm{m \rightarrow \infty }\).

We could also study the efficiency as defined in Tiago de Oliveira (1982), defined as the worst possible for the estimation of all quantiles. It was shown that this efficiency is the smallest root of \(\mathrm{det⁡ ( \hat{\Sigma }- \mu \Sigma ^{*} ) =0}\). But as \(\mathrm{\mu _{max\cdot} \mu _{min}=eff ( ( \lambda ^{*}, \delta ^{*} ) / ( \hat{ \lambda }, \hat{\delta} ) ) \approx \frac{ \pi ^{2}}{6} }\), we see that \(\mathrm{\mu _{min~} }\) is (approximately) smaller than \(\mathrm{\sqrt[]{{ \pi ^{2}}/{6}}=.77970 }\) for large \(\mathrm{ m }\). It was shown, in the basic paper referred to, that \(\mathrm{ ( \hat{ \lambda }, \hat{\delta} ) }\) — the yearly maxima method — is better than \(\mathrm{ ( \lambda ^{*}, \delta ^{*} ) }\) — the largest maxima method — if the quantile corresponds to a probability larger than \(\mathrm{p=.9 }\), for \(\mathrm{m\geq1 }\) using the asymptotic approximation, and in practice for every \(\mathrm{p}\) if \(\mathrm{m\geq1 5 }\) which is the general case.

This shows a definite superiority of the yearly maxima method over some wide-spread thinking, when both methods can he applied. Evidently a symbiosis of both methods can be useful in some cases.

These results are based on the exact assumption that the \(\mathrm{ \{ x_{ij} \} }\) have the Gumbel distribution \(\mathrm{ \Lambda }\). But these results can hold even with the assumption that the distribution \(\mathrm{ \{ x_{ij} \} }\) of belongs to the domain of attraction of \(\mathrm{ \Lambda }\) with a sufficiently fast rate of convergence of  \(\mathrm{ F^{n}~to~ \Lambda }\). This will certainly be true if  \(\mathrm{ F }\) is sufficiently close to \(\mathrm{ \Lambda }\) in an intuitive sense.

One might think, if the Gumbel distribution is valid, that the largest maxima method can always be applied for small \(m\)'s. But the assumption of the independence of the \(\mathrm{ x_{ij} }\)'s  is rarely satisfied in applications. The crucial point for this method is mainly the local dependence of the  \(\mathrm{ x_{ij} }\)'s. Often the largest values occur in clusters, which prevents the application of the largest maxima method; but the yearly maxima method \(\mathrm{ k }\) can still be applied.

We can mention that these results may useful for the design of experiments: when only a small amount of information is available, i.e. if only \(\mathrm{ m }\) instead of \(n=km\) observations can be recorded, a choice of design must be made, and if we are interested in large quantiles the block method should be used, chiefly because it guarantees the practical independence of the blocks.

An analysis similar to the one given here was made by Reiss (1987) for the Fréchet distribution, with special reference to the Hill estimator.

4 . Estimation using two or three quantiles:

It is important, at this point, to recall that \(\mathrm{\chi _{p}^{*}{\mathrm{{\stackrel{\mathrm{p}}{\rightarrow}} \, }} \chi _{p},\sqrt[]{n}~f ( \chi _{p} ) \frac{ \chi _{p}^{’}~ \chi _{p}~}{\sqrt[]{p ( 1-p ) }} }\) is asymptotically standard normal if \(\mathrm{ 0<f ( \chi _{p} ) < \infty }\), that \(\mathrm{ ( \chi _{p}^{*}, \chi _{q}^{*} ) }\), where \(\mathrm{ 0<p<q<1 }\) conveniently reduced is an asymptotically binormal pair with standard margins and correlation coefficient \(\mathrm{ \rho =\sqrt[]{\frac{p}{q}~\frac{1-q}{1-p}}>0 }\). Its extension to more sample quantiles is immediate. See in Chapter 1 “A note on the asymptotic behaviour of sample quantiles”.

Consider now the Gumbel distribution \(\mathrm{\Lambda ( ( x- \lambda ) / \delta ) }\). Then as \(\mathrm{ \chi _{p}= \lambda + \delta ( -log⁡ ( -log\,p ) ) }\), the estimation equations are

\(\mathrm{ Q_{p} ( n ) = \chi _{q}^{*}= \lambda ^{*}- \delta ^{*}~log ( -log~p ) }\),

\(\mathrm{ Q_{q} ( n ) = \chi _{q}^{*}= \lambda ^{*}- \delta ^{*}~log( -log~q ) }\),

and so the estimators are

\(\mathrm{ \lambda ^{*}=[ log ( -log~p ) Q_{q}-log ( -log~q ) ~Q_{p} ] /log ( \frac{log\,p}{log\,q}) }\)

\(\mathrm{\delta ^{*}= ( Q_{q}-Q_{p} ) /log ( \frac{log\,p}{log\,q} ) }\).

As the variance-covariance matrix of \(\mathrm{\left( Q_{q},Q_{p} \right) }\) is

\(\mathrm{V \sim \frac{1}{n}\mathrm{ \begin{bmatrix} \frac{p(1-p)}{f^2(\chi_p)} &\mathrm{ \frac{p(1-q)}{f(\chi_p)\,f(\chi_q)}} \\[0.3em] \mathrm{ \frac{p(1-q)}{f(\chi_p)\,f(\chi_q)} } & \mathrm{ \frac{q(1-q)}{f^2(\chi_q)} } \\[0.3em] \end{bmatrix}}}=\mathrm{\frac{\delta^2}{n}\mathrm{ \begin{bmatrix} \frac{1-p}{p(log\,p)^2} &\mathrm{ \frac{1-q}{q\,log\,p\,log\,q} }\\[0.3em] \mathrm{ \frac{1-q}{q\,log\,p\,log\,q} } & \mathrm{\frac{1-p}{q(log\,q)^2 } } \\[0.3em] \end{bmatrix}}}\)  

with \(\mathrm{ \mathrm{det} ( V ) \sim \frac{ \delta ^{4}}{n^{2}}\frac{ ( 1-q ) ~ ( q-p ) }{pq^{2} ( log\,p ) ^{2}~ ( log\,q ) ^{2}} }\), the variance-covariance matrix of \(\mathrm{ ( \lambda ^{*}, \delta ^{*} ) }\) as

\(\mathrm{ \left[ Q_{q},Q_{p} \right] ^{T}=\mathrm{\mathrm{ \begin{bmatrix}1&\mathrm{ -log(-log\,p)}\\[0.3em] \mathrm{ 1}&\mathrm{ -log(-log\,q) } \\[0.3em] \end{bmatrix}}} \left( \lambda ^{*}, \delta ^{*} \right) ^{T} }\)

is

\(\mathrm{ V^{*}=\mathrm{\mathrm{ \begin{bmatrix}1&\mathrm{ -log(-log\,p)}\\[0.3em] \mathrm{ 1}&\mathrm{ -log(-log\,q) } \\[0.3em] \end{bmatrix}}} -1~~V \mathrm{\mathrm{ \begin{bmatrix}1&\mathrm{ -log(-log\,p)}\\[0.3em] \mathrm{ 1}&\mathrm{ -log(-log\,q) } \\[0.3em] \end{bmatrix}}} -1^{T} }\)

with   \(\mathrm{ det ( V^{*} ) =det ( V ) /det^{2} \mathrm{\mathrm{ \begin{bmatrix}1&\mathrm{ -log(-log\,p)}\\[0.3em] \mathrm{ 1}&\mathrm{ -log(-log\,q) } \\[0.3em] \end{bmatrix}}} }\).

As the Cramér-Rao bound for the estimators of \(\mathrm{ ( \lambda , \delta ) }\) — corresponding to the ML estimators \(\mathrm{ ( \hat{ \lambda} , \hat{\delta } ) }\)  — is

\(\mathrm{ \hat{V}=\frac{ \delta ^{2}}{n} \mathrm{\mathrm{ \begin{bmatrix}1+\frac{6(1-\gamma)^2}{\pi^2}&\mathrm{ \frac{6(1-\gamma)}{\pi^2}}\\[0.3em] \mathrm{ \frac{6(1-\gamma)}{\pi^2}}&\mathrm{ \frac{6} {\pi^2}} \\[0.3em] \end{bmatrix}}} }\)

the Cramér efficiency is

\(\mathrm{ {det ( \hat{V} ) }/{det ( V^{*} ) }=\frac{6}{ \pi ^{2}}\frac{ ( log⁡( \frac{log\,p}{log\,q} ) t) ^{2}~p~q^{2} ( log\,p ) ^{2}~ ( log\,q ) ^{2}}{ ( 1-q ) ^{2}~ ( q-p ) ^{2}} }\).

The maximum efficiency is obtained for \(\mathrm{ p=.07 }\)  and \(\mathrm{ q=.76 }\)  and its value is 40.8%.

One could also compute the efficiency as defined in Tiago de Oliveira (1982), connected with the study of quantile estimators that follows.

The quantile estimator of the \(\mathrm{ \xi }\) -quantile \(\mathrm{ \chi _{ \xi } }\) is

\(\mathrm{ \chi _{ \xi }^{*} ( p,q ) =c_{1}~Q_{p}+c_{2}~Q_{q} }\)

with          \(\mathrm{ c_{1}+c_{2}=1 }\)  

and         \(\mathrm{ c_{1}~ \chi _{p}+c_{2}~ \chi _{q}= \chi _{ \xi }~or~ \left( -log\,p \right) ^{c_{1}}~ \left( -log\,q \right) ^{1-c_{1}}=-log \xi }\)

to be quasi-linearly invariant.

It is asymptotically normal with mean value \(\mathrm{ \chi _{ \xi } }\) and asymptotic variance

\(\mathrm{ V ( \chi _{ \xi }^{*} ( p,q ) ) \sim\frac{1}{n} \{ c_{1}^{2}\,\frac{p ( 1-p ) }{f^{2} ( \chi _{p} ) }+2c_{1} ( 1-c_{1} ) \frac{p ( 1-q ) }{f ( \chi _{p} ) ~f ( \chi _{q} ) }+ ( 1-c_{1} ) ^{2}\frac{q ( 1-q ) }{f^{2} ( \chi _{q} ) } \}= }\)

\(\mathrm{ \frac{ \delta ^{2}}{n} \{ c_{1}^{2}~\frac{1-p}{p ( log\,p ) ^{2}}+2c_{1} ( 1-c_{1} ) \frac{1-q}{q\,log\,p~log\,q}+ ( 1-c_{1} ) ^{2}~\frac{1-q}{q ( log\,q ) ^{2}} \} }\).

We can compute its asymptotic efficiency with respect to the Cramér-Rao bound, i.e., with respect to the ML estimator   \(\mathrm{ \hat{ \chi _{ \xi }}= \hat{\lambda} + \xi \,\hat{\delta } }\).

As \(\mathrm{ V ( \hat{ \chi _{ \xi } }) \sim \frac{ \delta ^{2}}{n} \{ 1+ ( 1- \gamma + \chi _{ \xi } ) ^{2} \} =\frac{ \delta ^{2}}{n} \{ 1+ ( 1- \gamma -log⁡ ( -log~ \xi) ) ^{2} \} }\), the asymptotic efficiency is

\(\mathrm{ lim_{n \rightarrow \infty}~\frac{V ( \hat{\chi _{ \xi } }) }{V ( \chi _{ \xi }^{*} ( p,q ) ) } }\)

and it can be maximized in \(\mathrm{ ( p,q ) ~ ( 0<p<q<1 ) }\).

Neves (1986) has made the calculations and the optimal results are as follows with \(\mathrm{ p,q }\), and \(\mathrm{ c_{1} }\) (recall that \(\mathrm{ c_{2}=1-c_{1} }\)) as functions of \(\mathrm{ \xi }\)  see  Table 1.

Table 9.1

ξ

p

q

c1

eff (%)

0.01

0.066

0.933

1.143720

65

0.02

0.072

0.943

1.104300

65

0.03

0.008

0.051

0.339046

67

0.05

0.014

0.086

0.360599

75

0.10

0.030

0.180

0.412024

82

0.20

0.070

0.350

0.459707

82

0.30

0.130

0.510

0.524218

81

0.40

0.210

0.640

0.574626

82

0.50

0.310

0.740

0.613838

83

0.60

0.430

0.820

0.653152

84

0.70

0.560

0.880

0.678641

83

0.80

0.710

0.930

0.723894

79

0.85

0.777

0.948

0.716725

73

0.89

0.010

0.755

-0.314795

68

0.90

0.010

0.759

-0.341731

68

0.95

0.013

0.776

-0.562649

68

0.99

0.021

0.793

-1.115900

67

 

Notice that we do not always have \(\mathrm{ p< \xi <q }\) , as might be expected from a naive analysis. This is connected to the fact that the graph of efficiency has two relative maxima and the absolute maximum changes with \(\mathrm{ \xi }\). For more details and the graphs see Neves (1986). The efficiency is thus reasonable for the usual quantiles.

Having thus sketched the way of analysing the use of quantiles, we will in the next cases/distributions concentrate only on the efficiency for quantiles estimators.

The Fréchet distribution has the form \(\mathrm{ \Phi _{ \alpha }( ( {x- \lambda) }/{ \delta } ) =0~if~x \leq \lambda , \Phi _{ \alpha }( ( {x- \lambda ) }/{ \delta } ) =exp⁡ \{ - (( {x- \lambda) }/{ \delta ) ^{ \alpha } \} }~if~x \geq \lambda }\). Supposing \(\mathrm{ \lambda }\) known \(\mathrm{ ( \lambda = \lambda _{0}=0 ) }\) for convenience we know that the increasing transformation \(\mathrm{ Y=log\,X }\) leads to a Gumbel distribution \(\mathrm{ \Lambda ( \frac{y-log~ \delta }{1/ \alpha } ) }\). The empirical quantiles of the \(\mathrm{ \{ y_{i} \} }\) are the corresponding ones of the \(\mathrm{ \{ x_{i} \} }\), and so the estimator of the \(\mathrm{ \xi }\) -quantile \(\mathrm{ \chi _{p} }\) is given by \(\mathrm{ log~ \chi _{ \xi }^{*}=c_{1}log~Q_{p}+ ( 1-c_{1} ) log~Q_{q} ,\,i.e., \chi _{ \xi }^{*}=Q_{p}^{c_{1}}~Q_{q}^{1-c_{1}} }\); recall that this is in correspondence with the second condition which, here, takes the form \(\mathrm{ c_{1}log\, \chi _{p}+ ( 1-c_{1} ) log~ \chi _{q}=log \,\chi _{ \xi }~or~ \chi _{ \xi }= \chi _{p}^{c_{1}}~ \chi _{q}^{1-c_{1}} }\), defining \(\mathrm{ c_{1} ( p,q, \xi ) }\).

\(\mathrm{ \chi _{ \xi }^{*} }\) is homogeneous, i.e., if the \(\mathrm{ \{ x_{i} \} }\), are multiplied by \(\mathrm{ c(>0) }\)  the same happens to \(\mathrm{ \chi _{ \xi }^{*} }\) ; as the asymptotic variance of the \(\mathrm{ \chi _{ \xi }^{*} }\) is the same as for \(\mathrm{ \chi _{ \xi }^{*} }\) for the Gumbel distribution above, the \(\mathrm{ \delta }\)-method shows that \(\mathrm{ \chi _{ \xi }^{*} ( p,q ) }\) is also asymptotically normal with mean value and asymptotic variance \(\mathrm{ \frac{ \delta ^{2}}{ ( -log \xi )^{{2}/{ \alpha }}~ \alpha ^{2}n}~ \{ 1+ ( 1- \gamma -log(-log \xi ) )^2\} }\) ; the efficiency is the same as before. Notice that the variance depends on \(\mathrm{ \delta }\) and \(\mathrm{ \alpha }\) and not only on \(\mathrm{ \alpha }\) as could be expected ( \(\mathrm{ 1/\alpha }\)  is dispersion parameter of the transformed variable \(\mathrm{ y }\)). This is, evidently, connected with the homogeneity but the not quasi-linearity of the estimator.

If for the Fréchet distribution we have \(\mathrm{ ( \lambda , \delta ) }\) unknown, but \(\mathrm{ \alpha = \alpha _{0} }\) is known, the situation is completely analogous to the location-dispersion situation above. The \(\mathrm{ \xi }\)-quantile is \(\mathrm{ \chi _{ \xi }= \lambda + ( -log \xi ) ^{-1/ \alpha _{0}}\cdot \delta }\) and \(\mathrm{ c_{1}=c_{1} ( p,q, \xi ) }\) is defined by \(\mathrm{ c_{1}~ \chi _{p}+ ( 1-c_{1} ) ~ \chi _{q}= \chi _{ \xi } }\) or \(\mathrm{ c_{1} ( -log~p ) ^{-{1}/{ \alpha _{0}}}+ ( 1-c_{1} ) ~ ( -log~q ) ^{-{1}/{ \alpha _{0}}}= ( -log ~\xi ) ^{-1/ \alpha _{0}} }\). The estimator \(\mathrm{ \chi _{ \xi }^{*} }\) of the \(\mathrm{ \xi }\) -quantile is

\(\mathrm{ \chi _{ \xi }^{*}=c_{1}Q_{p}+ ( 1-c_{1} ) Q_{q} }\)  

which is asymptotically normal with mean-value \(\mathrm{ \chi _{ \xi } }\) and asymptotic variance

\(\mathrm{ V ( \chi _{ \xi }^{*} ) \sim \frac{1}{n} \{ c_{1}^{2}~\frac{p ( 1-p ) }{f^{2} ( \chi _{p} ) }+2c_{1} ( 1-c_{1} ) \frac{p ( 1-q ) }{f ( \chi _{p} ) ~f ( \chi _{q} ) }+ ( 1-c_{1} ) ^{2}~\frac{q ( 1-q ) }{f^{2} ( \chi _{q} ) } \} }\)

\(\mathrm{ =\frac{ \delta ^{2}}{ \alpha _{0}^{2}n} \{ c_{1}^{2}\frac{1-p}{p \left( -log\,p \right) ^{2+2/ \alpha _{0}}}+2c_{1} \left( 1-c_{1} \right) \frac{1-q}{q\, \left( log\,p\cdot\log\,q \right) ^{1+1/ \alpha _{0}}}+ \left( 1-c_{1} \right) ^{2}\frac{1-q}{q \left( -logq \right) ^{2+2/ \alpha _{0}}} \} }\),  

and the ML estimator \(\mathrm{ \hat{\chi _{ \xi }}=\hat{ \lambda} + ( -log~ \xi ) }\), associated with the Cramér-Rao bound, has mean value \(\mathrm{ \chi _{ \xi } }\)   and asymptotic variance

\(\mathrm{ V ( \hat{\chi _{ \xi }} ) \sim \frac{ \delta ^{2}}{n} \{ ~\frac{ ( -log ~\xi ) ^{-2/ \alpha _{0}} ) }{ \alpha _{0}^{2}}~+\frac{ ( 1- \Gamma ( 2+1/ \alpha _{0} ) ~ ( -log~ \xi ) ^{-1/ \alpha _{0}} ) ^{2}}{ ( \alpha _{0}+1 ) ^{2}~ ( \Gamma ( 1+{2}/{ \alpha _{0}} ) - \Gamma ^{2} ( 1+1/ \alpha _{0} ) ) } \} }\).

The asymptotic efficiency can then be computed as before. For \(\mathrm{ \alpha _{0}=3 }\)  we have

Table 9.2 \(\mathrm{ ( \alpha _{0}=3) }\)

Table 9.2

ξ

p

q

c1

eff (%)

0.010

0.0399

0.8632

1.062470

64

0.015

0.0500

0.8500

1.064880

63

0.018

0.0045

0.0320

0.379776

69

0.020

0.0050

0.0356

0.362109

71

0.050

0.0132

0.0924

0.407834

80

0.100

0.0300

0.1890

0.465436

82

0.200

0.0770

0.3720

0.550790

82

0.300

0.1500

0.5180

0.613496

84

0.400

0.2450

0.6280

0.655708

86

0.500

0.3570

0.7150

0.688727

87

0.600

0.4780

0.7870

0.713217

84

0.700

0.6050

0.8490

0.732972

78

0.740

0.6500

0.9000

0.788330

73

0.750

0.0043

0.5952

-0.399820

73

0.800

0.0050

0.6090

-0.558671

73

0.900

0.0070

0.6320

-1.155680

72

0.950

0.0082

0.6448

-1.901690

72

0.990

0.0095

0.6591

-4.454210

71

 

We can make comments similar to those for the Gumbel distribution.

Let us recall that, as seen before, \(\mathrm{ \Phi _{ \alpha } ( 1+\frac{x- \lambda }{ \alpha ~ \delta } ) = \Phi _{ \alpha } ( \frac{x- ( \lambda - \alpha ~ \delta ) }{ \alpha ~ \delta } ) \rightarrow \Lambda ( \frac{x- \lambda }{~ \delta } ) }\) as \(\mathrm{ \alpha \rightarrow +\infty }\). Then \(\mathrm{ \lambda - \alpha ~ \delta }\) takes the place of \(\mathrm{ \lambda }\) and \(\mathrm{ \alpha\, \delta }\) that of \(\mathrm{ \delta }\) in the previous formulation, and the \(\mathrm{ \xi }\) -quantile is \(\mathrm{ \tilde{\chi _{ \xi }}= \lambda - \alpha ( 1- ( -log~ \xi ) ^{-1/ \alpha } ) ~ \delta }\) and for fixed \(\mathrm{ \alpha_0 }\) the ML estimator of \(\mathrm{ \tilde{\chi _{ \xi }}~is~ \tilde{\tilde{\chi _{ \xi }}}=\tilde{ \lambda} - \alpha _{0} ( 1- ( -log~ \xi ) ^{-1/ \alpha _{0}} ) \tilde{ \delta} }\) where \(\mathrm{ \tilde{ \lambda } }\) and \(\mathrm{ \tilde{ \delta } }\) are the ML estimators of the new parameters \(\mathrm{\left( \lambda , \delta \right) }\). \(\mathrm{\tilde{\tilde{\chi _{ \xi }}}}\) is also asymptotically normal with mean value \(\mathrm{\tilde{\chi _{ \xi }}}\) and variance asymptotic to

\(\mathrm{ ~\frac{ \alpha _{0}^{2}~ \delta ^{2}}{n} \{ \frac{ ( -log \xi ) ^{-2/ \alpha _{0}}}{ \alpha _{0}^{2}}+\frac{ ( 1- \Gamma ( 2+1/ \alpha _{0} ) ~ ( -log \xi ) ^{-1/ \alpha _{0}} ) ^{2}}{ ( \alpha _{0}+1 ) ^{2} ( \Gamma ( 1+{2}/{ \alpha _{0}} ) - \Gamma ^{2} ( 1+1/ \alpha _{0}) ) } \} }\);

when \(\mathrm{ \alpha _{0} \rightarrow + \infty }\) we see that \(\mathrm{ \tilde{\chi _{ \xi }} \rightarrow \lambda -log ( -log\, \xi ) \delta }\) and the variance asymptotic is to \(\mathrm{ \frac{ \delta ^{2}}{n} \{ 1+\frac{6}{ \pi ^{2}} ( 1- \gamma -log ( -log\, \xi ) ) ^{2} \} }\) which are the corresponding values for the Gumbel distribution \(\mathrm{ \Lambda ( ( x- \lambda ) / \delta ) }\), as could be expected from

\(\mathrm{ \Phi _{ \alpha _{0}} ( 1+\frac{x- \lambda }{ \alpha _{0}} ) \rightarrow \Lambda ( \frac{x- \lambda }{ \delta }) ~as~ \alpha _{0} \rightarrow \infty }\) .

Consider now the three-parameter case for the Fréchet distribution.

It is not possible to obtain linear combinations of three quantiles \(\mathrm{ ( Q_{p},~Q_{q},~Q_{r} ) }\) , with constant coefficients depending on \(\mathrm{ \left( p,q,r \right) }\), to estimate parameters; they should always depend also on \(\mathrm{ \alpha }\) being estimated, and this could be presumed because it appears in a weak form in the estimation of the \(\mathrm{ \xi }\) -quantile when \(\mathrm{ \lambda = \lambda _{0} }\)  is known. The parameter \(\mathrm{ \alpha }\) can be estimated by the location-dispersion-free ratio.

\(\mathrm{ \frac{Q_{r}-Q_{q}~}{Q_{q}-Q_{p}}{\mathrm{{\stackrel{\mathrm{p}}{\rightarrow}} \, }}g \left( \alpha \right) =\frac{ \left( -log\,r \right) ^{-1/ \alpha }- \left( -log\,q \right) ^{-1/ \alpha }}{ \left( -log\,q \right) ^{-{1}/{ \alpha }}- \left( -log\,p \right) ^{-1/ \alpha }} }\).

Thus the estimator \(\mathrm{ \alpha^* }\) can be given by \(\mathrm{ g ( \alpha ^{*} ) =\frac{Q_{r}-Q_{q}}{Q_{q}-Q_{p}} }\) which, by the \(\mathrm{ \delta }\)-method, can be shown to be asymptotically normal with mean value \(\mathrm{ \alpha }\) and variance \(\mathrm{ O ( n^{-1} ) }\); the triple \(\mathrm{ ( \lambda ^{*}, \delta ^{*}, \alpha ^{*} ) }\) is also asymptotically trinormal with mean value \(\mathrm{( \lambda , \delta , \alpha ) }\) and variance-covariance matrix \(\mathrm{ O ( n^{-1} ) }\) .  

A simple solution is to take \(\mathrm{ \frac{log\,p}{log\,q}=c>1 }\)  and \(\mathrm{\frac{log\,r}{log\,q}=c^{-1}<1 }\) ; we get

\(\mathrm{ \frac{Q_{r}-Q_{q}~}{Q_{q}-Q_{p}}=c^{1/ \alpha } }\)   and   \(\mathrm{ p=q^{c},q=r^{c} }\);

Any choice of \(\mathrm{ c }\)  gives a system for the estimation of  \(\mathrm{ \alpha }\).

The best choice of \(\mathrm{ \left( p,q,r \right) }\) for each \(\mathrm{ \xi }\), or even the choice of \(\mathrm{ \left( p,q,r \right) }\) that maximizes the asymptotic efficiency, has not yet been studied and should not be expected to be very efficient.

The method only seems useful, at present, to obtain the first estimates of \(\mathrm{( \lambda , \delta , \alpha ) }\) from the equations

\(\mathrm{ Q_{p}= \chi _{p}^{*}= \lambda ^{*}+ ( -log~p ) ^{-1/ \alpha ^{*}} \delta ^{*} }\)

\(\mathrm{ Q_{q}= \chi _{q}^{*}= \lambda ^{*}+ ( -log~q ) ^{-1/ \alpha ^{*}} \delta ^{*} }\)

\(\mathrm{ Q_{r}= \chi _{r}^{*}= \lambda ^{*}+ ( -log~r ) ^{-1/ \alpha ^{*}} \delta ^{*} }\)

to be used to seed the solution of the ML equations and then to estimate the quantiles; using these estimators to estimate the quantiles their asymptotically normal behaviour can be obtained, through simple but lengthy computations, by the use of the \(\mathrm{ \delta }\)-method.

Recall that for the Fréchet distribution we must always have \(\mathrm{ \lambda ^{*} \leq X_{1}^{’} }\), which imposes a new condition on the \(\mathrm{ ( X_{1}^{’},Q_{p},Q_{q} ) ~or~ ( X_{1}^{’},Q_{p},Q_{q},Q_{r} ) }\) to allow the use of the method; note that \(\mathrm{ prob \{ \lambda ^{*} \leq X_{1}^{’} \} \rightarrow 1 }\).

Consider finally the Weibull distribution (for minima) \(\mathrm{ W_{ \alpha } ( ( x- \lambda ) / \delta ) }\) and suppose that  \(\mathrm{ \lambda = \lambda _{0} }\) is known (\(\mathrm{ \lambda = \lambda _{0} }\) for convenience). Then by the transformation \(\mathrm{ Y=-log~X }\) we get a Gumbel random variable with parameters \(\mathrm{ ( -log~ \delta ,1/ \alpha ) }\). As the transformation is decreasing we have \(\mathrm{ \chi _{1- \xi } ( Gumbel ) =-log~ \chi _{ \xi } (Weibull)}\), and so the estimator of the \(\mathrm{ \xi }\) -quantile is

\(\mathrm{ \chi _{ \xi }^{*}=Q_{p}^{d_{1}}~Q_{q}^{1-d_{1}} }\)

where \(\mathrm{ d_{1}( p,q, \xi ) =c_{1} ( 1-q,1-p,1- \xi t) ; \chi _{ \xi }^{*} }\) is also homogeneous; as the asymptotic variance of \(\mathrm{ -log\, \chi _{ \xi }^{*} }\) is the same as that for \(\mathrm{ \chi _{1- \xi }^{*} }\) in the Gumbel distribution, the \(\mathrm{ \delta }\)-method shows that \(\mathrm{ \chi _{ \xi }^{*} ( p,q ) }\) is also asymptotically normal with mean value \(\mathrm{ \chi _{ \xi } }\) and asymptotic variance \(\mathrm{ V ( \chi _{ \xi }^{*} ( p,q ) ) \sim\frac{~ \delta ^{2}}{ ( -log ( 1- \xi ) ) ^{{2}/{ \alpha }}~ \alpha ^{2}~n} \, \{ 1+ ( 1- \gamma -log ( -log ( 1- \xi ) ) ) ^{2} \} }\), the efficiency being the same as before with the exchange of \(\mathrm{ ( \xi ,p,q ) ~by~( 1- \xi ,1-p,1-q ) }\).

The same could be obtained by noting that if \(\mathrm{ X }\) has the distribution \(\mathrm{ W_{ \alpha } ( x/ \delta ) }\) then \(\mathrm{ 1/X }\) has the distribution \(\mathrm{ \Phi _{ \alpha } ( {x}/{{( 1}/{ \delta }} ) ) }\) .

Suppose now that we have the Weibull distribution (for minima) with \(\mathrm{ \alpha = \alpha _{0}>2 }\) known, i.e., the distribution is \(\mathrm{ W_{ \alpha _{0}} ( ( x- \lambda ) / \delta ) }\) whose quantiles are  \(\mathrm{ \chi _{ \xi }= \lambda + ( -log \,( 1- \xi ) ) ^{1/ \alpha _{0}}~ \delta }\). Consider the estimator \(\mathrm{ \chi _{ \xi }^{*}= \chi _{ \xi }^{*} ( p,q ) =c_{1}~Q_{p}+c_{2}~Q_{q} }\) with \(\mathrm{ c_{1}+c_{2}=1 }\) and \(\mathrm{ c_{1} ( -log ( 1-p) ) ^{{1}/{ \alpha _{0}}}+c_{2} ( -log ( 1-q ) ) ^{{1}/{ \alpha _{0}}}= ( -log ( 1- \xi ) ) ^{{1}/{ \alpha _{0}}} }\) . Evidently \(\mathrm{ \chi _{ \xi }^{*} }\) is asymptotically normal with mean value \(\mathrm{ \chi _{ \xi } }\)  and asymptotic variance

\(\mathrm{ V ( \chi _{ \xi }^{*} ) \sim \frac{ \delta ^{2}}{ \alpha _{0}^{2}~n} \{ c_{1}^{2}~\frac{p}{ ( 1-p ) ( -log⁡ ( 1-p ) ) ^{2-{2}/{ \alpha _{0}}}}+2~c_{1} ( 1-c_{1} ) }\)

\(\mathrm{ \frac{p}{ ( 1-p ) ~ ( log\,p~log\,q ) ^{1-{1}/{ \alpha _{0}}}}+ ( 1-c_{1} ) ^{2}~\frac{q}{ ( 1-q ) ( -log⁡ ( 1-q ) ) ^{2-{2}/{ \alpha _{0}}}} \} }\).

 The ML estimator, for \(\mathrm{ \alpha _{0}>2 }\) , associated with the Cramér-Rao bound \(\mathrm{ \hat{\chi _{ \xi }}=\hat{ \lambda }+ ( -log ( 1- \xi ) ) ^{{1}/{ \alpha _{0}}}~ \hat{\delta } }\) has the mean value \(\mathrm{ \chi _{ \xi } }\)  and asymptotic variance

\(\mathrm{ V ( \hat{ \chi _{ \xi }} ) \sim \frac{ \delta ^{2}}{n} \{ \frac{ ( -log ( 1- \xi )) ^{{2}/{ \alpha _{0}}}}{ \alpha _{0}^{2}}+\frac{ ( 1- \Gamma ( 2-{1}/{ \alpha _{0}} ) ( -log ( 1- \xi ) ) ^{{-1}/{ \alpha _{0}}} ) ^{2}}{ ( \alpha _{0}-1 ) ^{2} \{ \Gamma ( 1-2 \alpha _{0} ) - \Gamma ^{2} ( 1-1 \alpha _{0} ) \} } \} }\).

Note the similarity of this result (for \(\mathrm{ \alpha _{0}>2 }\) ) with the corresponding one for the Fréchet distribution already given.

Let us obtain an asymptotic result analogous to the one connected with the convergence  \(\mathrm{ \Phi _{ \alpha } ( \lambda +\frac{x- \lambda }{ \alpha ~ \delta } ) \rightarrow \Lambda ( \frac{x- \lambda }{ \delta } ) }\).  

In our case we have \(\mathrm{ 1-W_{ \alpha } ( 1-\frac{x- \lambda }{ \delta } ) =1-W_{ \alpha }( -\frac{x- ( \lambda + \alpha ~ \delta ) }{ \alpha ~ \delta } ) \rightarrow \Lambda ( \frac{x- \lambda }{ \delta } ) }\), \(\alpha \rightarrow \infty, \) and the \(\mathrm{ \xi }\)-quantile is \(\mathrm{ \tilde{ \chi _{ \xi }}= \lambda + \alpha ( 1- ( -log~ \xi ) ^{1/ \alpha } ) \delta }\) corresponding to the substitution of \(\mathrm{ \lambda }\) by \(\mathrm{ \lambda + \alpha ~ \delta }\), of \(\mathrm{ \delta }\) by \(\mathrm{ \alpha ~ \delta }\) and \(\mathrm{ \xi }\) by \(\mathrm{1- \xi }\) as we are, for \(\mathrm{ W_{ \alpha } }\), dealing with minima and for \(\mathrm{ \Lambda }\) dealing with maxima.

\(\mathrm{ \tilde{\tilde{\chi _{ \xi }}}= \tilde{\lambda }+ \alpha _{0} ( 1- ( -log~ \xi ) ^{1/ \alpha _{0}} ) \tilde{ \delta} }\)  is the ML estimator of \(\mathrm{ \tilde{\chi _{ \xi }} }\), for \(\mathrm{ \alpha _{0}>2 }\), \(\mathrm{( \tilde{ \lambda} , \tilde{\delta} ) }\) being the ML estimators of the new \(\mathrm{ ( \lambda, \delta ) }\).

Thus, as the transformations are linear, we know that \(\mathrm{ \tilde{\chi _{ \xi }} }\) is asymptotically normal with mean value \(\mathrm{ \tilde{\chi _{ \xi }} }\) and variance

\(\mathrm{ V ( \tilde{\tilde{ \chi _{ \xi } }}) \sim \frac{ \alpha _{0}^{2}~ \delta ^{2}}{n} \{ \frac{ ( -log~ \xi ) ^{2/ \alpha _{0}}}{ \alpha _{0}^{2}}+\frac{ ( 1- \Gamma ( 2-1/ \alpha _{0} ) ( -log~ \xi ) ^{1/ \alpha _{0}} ) ^{2}}{ ( \alpha _{0}-1 ) ^{2}~ ( \Gamma ( 1-{2}/{ \alpha _{0}} ) - \Gamma ^{2} ( 1-1/ \alpha _{0} ) ) } \} }\).

Letting \(\alpha_0 \rightarrow \infty, \) and corresponding to the fact that \(\mathrm{ 1-W_{ \alpha _{0}} ( 1-\frac{x- \lambda }{ \alpha _{0}} ) \rightarrow \Lambda ( \frac{x- \lambda }{ \delta } ) }\), we see that \(\mathrm{ \tilde{\chi _{ \xi }} \rightarrow \lambda -log ( -log\, \xi ) ) \delta }\) and the variance is asymptotic to \(\mathrm{ \frac{ \delta ^{2}}{n} \{ 1+\frac{6}{ \pi ^{2}} ( 1- \gamma -log⁡ ( -log~ \xi t) ) ^{2} \} }\) which are the corresponding values for the Gumbel distribution, as happened for the Fréchet distribution before.

Consider now the three-parameter case for the Weibull distribution (of minima). The estimation is completely analogous to the previous one for the Fréchet distribution (for minima) with the added difficulty of not having regular ML estimators if \(\mathrm{ \alpha \leq 2 }\). The equations are analogous to the ones for the Fréchet distribution and quantile estimation is also analogous.

We must recall that any estimator \(\mathrm{ \lambda ^{*} }\) should be such that \(\mathrm{ \lambda ^{*} \leq X_{1}^{’} }\), which imposes, conditions on \(\mathrm{ ( X_{1}^{’},Q_{p},Q_{q} )\, or\, ( X_{1}^{’},Q_{p},Q_{q},Q_{r} ) }\) to allow the use of the method; although we know that \(\mathrm{ Prob \{ \lambda ^{*} \leq X_{1}^{’} \} \rightarrow 1 }\).

See also the exercises concerning the use of \(\mathrm{ X_{1}^{’} }\) and of one or two empirical quantiles to estimate the parameters of the Fréchet distribution or the Weibull distribution (for minima) and their \(\mathrm{ \xi }\)-quantiles.

5 . Estimation of the parameters using block partitions of the sample:

As a matter of convenience, suppose we have the i.i.d. sample \(\mathrm{ ( x_{1},\dotsc,x_{n} ) }\), the ordered sample \(\mathrm{ ( x_{1}^{’} \leq x_{2}^{’} \leq \dotso \leq x_{n}^{’}) }\), we choose probability levels \(\mathrm{ ( p,q ) }\) with \(\mathrm{ 0<p<q<1 }\), and we split the sample into three blocks:

\(\mathrm{ x_{1}^{’} \leq \dotso \leq x_{ [ np ] }^{’} }\)           \(\mathrm{ with~average~\bar{x}_{1}= \sum _{1}^{ [ np ] }x_{i}^{’}/ [ np ] }\) ,

\(\mathrm{ x_{ [ np ] +1}^{’} \leq \dotso \leq x_{ [ nq ] }^{’} }\)     \(\mathrm{ with~average~\bar{x}_{2}= \sum _{ [ np ] +1}^{ [ nq ] }x_{i}^{’}/ ( [ nq ] - [ np ] ) }\),

\(\mathrm{ x_{ [ nq ] +1}^{’} \leq \dotso \leq x_{n}^{’} }\)        \(\mathrm{ with~average~\bar{x}_{3}= \sum _{ [ np ] +1}^{n}x_{i}^{’}/ ( n- [ nq ] ) }\) ;

we are supposing \(\mathrm{ n }\) sufficiently large such that \(\mathrm{ 1< [ np ] < [ nq ] <n~or~p>1/n~and~q-p>1/n }\).

As can be expected, the triple \(\mathrm{ ( \bar{x}_{1},\bar{x}_{2},\bar{x}_{n} ) }\) has an asymptotic trinormal distribution. Their mean values are \(\mathrm{( \lambda + \delta \mu _{1}, \lambda + \delta \mu _{2}, \lambda + \delta \mu _{3} ) }\) and the variance-covariance matrix is

\(\mathrm{\frac{ \delta ^{2}}{n} \Omega ^{-1} \sim \frac{ \delta ^{2}}{n} \begin{bmatrix} \mathrm{ {\sigma_1^2/p}} & {\sigma_{12}} & {\sigma_{13}} \\[0.3em] {\sigma_{12}} & \mathrm{ {\sigma_{2}^2/(q-p)} } & {\sigma_{23}} \\[0.3em] {\sigma_{13}} & \mathrm{{\sigma_{23}} } & \mathrm{ {\sigma_{3}^2/(1-q)}} \end{bmatrix} }\)

where \(\mathrm{ F ( \frac{x- \lambda }{ \delta }) }\) is the form of the distribution function of the \(\mathrm{ x_i }\). We have, for the mean values,

\(\mathrm{ \mu _{1}=\frac{1}{p} \int _{0}^{p}F^{-1} ( w ) d~w }\) ,

\(\mathrm{ \mu _{2}=\frac{1}{q-p} \int _{p}^{q}F^{-1}( w ) d~w }\) ,         and

\(\mathrm{ \mu _{3}=\frac{1}{1-q} \int _{q}^{1}F^{-1} ( w ) d~w }\) ,

and for the terms of the variance-covariance matrix :

\(\mathrm{ \sigma _{1}^{2}=\frac{1}{p} \int _{0}^{p} ( F^{-1} ( w ) ) ^{2}d~w- \mu _{1}^{2}+ ( \chi _{p}- \mu _{1} ) ^{2} ( 1-p) }\),

\(\mathrm{ \sigma _{2}^{2}=\frac{1}{q-p} \int _{p}^{q} ( F^{-1} ( w ) ) ^{2}d~w- \mu _{2}^{2}+\frac{1}{q-p} \{ p ( 1-p ) ( \mu _{2}-F^{-1} ( p ) ) ^{2}+ }\)

\(\mathrm{ +q ( 1-q ) F^{-1} ( q ) - \mu _{2} ) ^{2}+2\,p ( 1-q ) ( \mu _{2}-F^{-1} ( p ) ) ( F^{-1} ( q ) - \mu _{2} ) \} }\),

\(\mathrm{ \sigma _{3}^{2}=\frac{1}{1-q} \int _{q}^{1} ( F^{-1} ( w ) ) ^{2}d~w- \mu _{3}^{2}+q ( \mu _{3}-F^{-1}( q ) ) ^{2} }\) , 

\(\mathrm{ \sigma _{12}= ( F^{-1} ( p ) - \mu _{1} ) [ \mu _{2}+\frac{1-q}{1-p}~F^{-1} ( q ) -\frac{1-p}{q-p}~F^{-1} ( p ) ] }\) ,

\(\mathrm{ \sigma _{13}= ( \mu _{3}-F^{-1} ( q )) ~ ( F^{-1} ( p ) - \mu _{1} ) }\),   and

\(\mathrm{ \sigma _{23}= ( \mu _{3}-F^{-1} ( q ) ) ( - \mu _{2}+\frac{q}{q-p}~F^{-1} ( q ) -\frac{p}{q-p}~F^{-1} ( p ) ) }\).

Notice that these values are expressed in the reduced distribution function.

Our purpose is to determine, first, coefficients \((l_{1},l_{2},l_{3} ) \) and \(\mathrm{ ( d_{1},d_{2},d_{3} ) }\) such that

\(\mathrm{\lambda ^{*}= \sum _{1}^{3}\mathit{l}_{i}~\bar{x}_{i} }\)   and  \(\mathrm{ \delta ^{*}= \sum _{1}^{3}d_{i}~\bar{x}_{i} }\)

are the least-squares estimators of \(\mathrm{ ( \lambda , \delta ) }\). Denoting by \(\mathrm{ [ \mu ] ^{T}= ( \mu _{1}, \mu _{2}, \mu _{3} ) , [ 1 ] ^{T}= ( 1,1,1 ) , [ \bar{x} ] ^{T}= ( \bar{x}_{1},\bar{x}_{2},\bar{x}_{3} ) }\), and by \(\mathrm{ \Omega }\), as written before, the inverse of the variance-covariance matrix, with \(\mathrm{ \Gamma = \Omega ( [ 1 ] [ \mu ] ^{T}- [ \mu ] [ 1 ] ^{T} ) \Omega / \Delta }\) where \(\mathrm{ \Delta = [ 1] ^{T}~ \Omega ( [1] \cdot [ \mu ] ^{T}~ \Omega [ \mu ] - [1 ] ^{T}~ \Omega [ \mu ] ^{2} ) }\), we have

\(\mathrm{ \lambda ^{*}=- [ \mu ] ^{T}~ \Gamma [ \bar{x} ] }\),

\(\mathrm{ \delta ^{*}=- [ 1 ] ^{T}~ \Gamma [ \bar{x} ] }\),

the coefficients of the linear combinations being, in obvious notation,

\(\mathrm{[ \mathit{l} ] ^{T}=- [ \mu ] ^{T}~ \Gamma }\)     and      \(\mathrm{[ d ] ^{T}= [ 1 ] ^{T}~ \Gamma }\).

The pair \(\mathrm{( \lambda ^{*}, \delta ^{*} ) }\) is asymptotically binormal with mean values \(\mathrm{ ( \lambda , \delta ) }\) and variance-covariance matrix

\(\mathrm{ \frac{ \delta ^{2}}{n~ \Delta } \begin{bmatrix} \mathrm{[\mu]^T~\Omega[\mu] } & \mathrm{[1]^T~\Omega[\mu] } \\[0.3em] \mathrm{[1]^T~\Omega[\mu] }& \mathrm{[1]^T~\Omega[1] } \\[0.3em] \end{bmatrix} }\)  

and the asymptotic efficiency relative to the Cramér-Rao bounds (corresponding to the ML estimators) is

\(\mathrm{ eff= \Delta / \delta ^{4} \times \{ M ( \frac{ \partial\, log\,f}{ \partial ~ \lambda } ) ^{2}\cdot M ( \frac{ \partial \,log\,f}{ \partial ~ \delta } ) ^{2}-M ( \frac{ \partial\, log\,f}{ \partial ~ \lambda }\frac{ \partial \,log\,f}{ \partial ~ \delta } ) ^{2} \} }\)

where     \(\mathrm{ f ( \frac{x- \lambda }{ \delta } ) =\frac{d~F(( {x- \lambda ) }/{ \delta } ) }{d~x}=\frac{1}{ \delta }~F’ ( \frac{x- \lambda }{ \delta } ) }\).

Notice that the coefficients, when a shape parameter exists, should be written more correctly as \(l_{i} ( \alpha ) \) and \(\mathrm{ d_{i} ( \alpha ) }\).

The study of the maxima efficiency for the Fréchet distribution can be summarized in the following short table (the \(\mathrm{ \mu _{i} }\) exist only for \(\mathrm{ \alpha _{0}>2 }\) ): 

Table 9.3

 

max. eff (%)

p

q

\( \alpha_0\)=3

84.3

.10

.82

\( \alpha_0\)=6

88.7

.11

.72

\( \alpha_0\)=8

90.9

.07

.45

\( \alpha_0\)=9

91.9

.07

.44

\( \alpha_0\)=10

92.6

.06

.41

 

Recall that the result for \(\mathrm{ \alpha _{0}=10 }\) is very similar to the one corresponding to Gumbel distribution, as known.

We can also, in the same way, study right, left and doubly censored samples. It corresponds, in the first and second case, to choosing three probabilities \(\mathrm{ 0<p<q<r<1 }\) and discarding the right-block average \(\mathrm{ \bar{x}_4 }\) or the left-block average \(\mathrm{ \bar{x}_1 }\) (obvious notations), which is equivalent to taking \(\mathrm{ \mathit{l}_{4}=d_{4}=0~or~ \mathit{l}_{1}=d_{1}=0 }\). Clearly the triples \(\mathrm{ ( \bar{x}_{1},\bar{x}_{2},\bar{x}_{3} ) }\) and \(\mathrm{ ( \bar{x}_{2},\bar{x}_{3},\bar{x}_{4} ) }\) are asymptotically trinormal with a variance-covariance matrix which is the obvious sub-matrix of that for a 4-block partition, analogous to the one given. By the technique above we can obtain, when possible, the asymptotic relative efficiency of estimators \(\mathrm{ ( \lambda ^{*}, \delta ^{*} ) }\) which are linear combinations of the averages, quasi-linearity invariant, i.e., for \(\mathrm{\alpha _{0}>2 }\) in both the Fréchet and Weibull (for minima) cases and in Gumbel case, and optimize for \(\mathrm{0<p<q<r<1 }\), and then compute the best coefficients that, as before, depend on \(\mathrm{\alpha _{0} }\). Note that the best efficiency using one of the three averages of the simply censored sample can be better than the use of three averages in a non-censored sample; this seems a paradox but is easily explained. When using a simply censored sample we are partitioning it into four blocks and discarding the right or the left one: so comparison should be made with the efficiency of a 4-block partition (not a 3-block one!) and then the efficiency of the uncensored sample is better.

A doubly censored sample corresponds to choosing four probabilities \(\mathrm{0<p<q<r<s<1 }\), and to discharging the extreme averages  \(\mathrm{ \bar{x}_1 }\) and \(\mathrm{ \bar{x}_5 }\). The remaining triple \(\mathrm{ ( \bar{x}_{2},\bar{x}_{3},\bar{x}_{4} ) }\) is asymptotically trinormal with a variance-covariance matrix which is the obvious sub-matrix of the one corresponding to the 5-block partition. The best choice of the coefficients of the linear combination can be made, as before, by seeking the probabilities that maximize efficiency and then computing the best coefficients, depending as always on \(\mathrm{\alpha _{0} }\).

Let us, finally, consider the case when we have the three-parameters \(\mathrm{ ( \lambda , \delta , \alpha >2 ) }\) unknown the Gumbel case is contained in the model for \(\mathrm{ \alpha =+~ \infty }\) or, (practically speaking, \(\mathrm{ \alpha }\) large) and a non-censored sample with a 3-block partition.

We will not make all the computations but only sketch the method, along the lines of what was done previously.

Supposing \(\mathrm{ \alpha }\) known, we obtained the best estimators of \(\mathrm{ ( \lambda, \delta ) }\)

\(\mathrm{ \lambda ^{*}= \sum _{1}^{3}\mathit{l}_{i} \,( \alpha ) \bar{x}_{i} }\)

\(\mathrm{ \delta ^{*}= \sum _{1}^{3}d_{i} ( \alpha ) \,\bar{x}_{i} }\)

in the beginning of the section.

The ratio statistic \(\mathrm{ \frac{\bar{x}_{3}-\bar{x}_{2}}{\bar{x}_{2}-\bar{x}_{1}} \rightarrow g ( \alpha ) =\frac{ \mu _{3}- \mu _{2}}{ \mu _{2}- \mu _{1}} }\) is location-dispersion-free.

Then, by the \(\mathrm{ \delta}\)-method, the ratio statistic is asymptotically normal with mean value \(\mathrm{ g \left( \alpha \right) }\) and thus, by the same method, \(\mathrm{ \alpha ^{*} }\), given by \(\mathrm{ g ( \alpha ^{*} ) =\frac{\bar{x}_{3}-\bar{x}_{2}}{\bar{x}_{2}-\bar{x}_{1}} }\), is asymptotically normal with variance \(\mathrm{ O ( n^{-1} ) }\) . Also the triple \(\mathrm{ ( \lambda ^{*}, \delta ^{*}, \alpha ^{*} ) }\) is asymptotically normal with mean value \(\mathrm{ ( \lambda, \delta , \alpha) }\)  and variance-covariance matrix \(\mathrm{ O ( n^{-1} ) }\). But, in fact, \(\mathrm{ \lambda ^{*}= \sum _{1}^{3}\mathit{l}_{i} \,( \alpha ) \bar{x}_{i} }\) and \(\mathrm{ \delta ^{*}= \sum _{1}^{3}d_{i} ( \alpha ) \,\bar{x}_{i} }\), written above, depend on \(\mathrm{\alpha }\) and we should use \(\mathrm{ \lambda ^{**}= \sum _{1}^{3}\mathit{l}_{i} ( \alpha ^{*} )\, \bar{x}_{i}, \delta ^{**}= \sum _{1}^{3}d_{i} ( \alpha ^{*} )\, \bar{x}_{i} }\), the triple \(\mathrm{ ( \lambda ^{**}, \delta ^{**}, \alpha ^{*} ) }\) being, also by the \(\mathrm{ \delta}\)-method, asymptotically trinormal with mean value \(\mathrm{ ( \lambda, \delta , \alpha) }\) and variance-covariance matrix of \(\mathrm{ O ( n^{-1} ) }\). The new coefficients  \(\mathrm{ \mathit{l}_{i} ( \alpha ^{*} ) ,d_{i} ( \alpha ^{*} ) }\)  are not the best  (\(\mathrm{ \alpha >2 }\)  supposed known), although they can be expected to be close to the best ones as \(\mathrm{ \alpha ^* }\) should be close to  \(\mathrm{ ( \alpha >2 )}\). But the asymptotic efficiency of the method drops down to values around 0.1 and 0.2 and so the method is poor, as could be expected. We could seek functions \(\mathrm{ \mathit{\bar{l}}_{i} ( \alpha) ,\bar{d}_{i} ( \alpha ) }\) such that  \(\mathrm{ \sum _{1}^{3}\mathit{\bar{l}}_{i} ( (\alpha ^{*} )~ \bar{x}_{i}, \sum _{1}^{3}\bar{d}_{i} ( \alpha ^{*} ) ~\bar{x}_{i}, \alpha ^{*} ) }\) have maximum asymptotic efficiency. This is an open and important problem which is very practical and allowing a quick decision; recall that as we want quasi-linear invariance we must have \(\mathrm{ \sum _{1}^{3}\mathit{\bar{l}}_{i} (\alpha)=1 }\) and \(\mathrm{ \sum _{1}^{3}\bar{d}_{i} ( \alpha ) =0 }\). The extension of the previous reasonings to censored samples is immediate as well as to the unsolved problem.

6 . Estimation using exceedances over thresholds

This section contains two different approaches, the peaks-over-threshold (P.O.T.) method and the all-excesses-over-threshold (A.E.O.T.) method.

The P.O.T. method can be connected with the notion of the return period, already given, the papers by Gumbel (1955) on calculated risk, Tiago de Oliveira (1984) on large earthquakes, and Leadbetter, Lindgren and Rootzén (1983) on extreme sea states.

Let us fix a threshold \(u\) and consider in the sequence of random variables \(\mathrm{ \{ X_{i} \} }\) the exceedances over \(u\), i.e., the random variables \({\mathrm{{ X_{j}}}>{u }}\). If the \(\mathrm{X_{i} }\) has the distribution function \(\mathrm{F ( x ) }\), the exceedance \({\mathrm{{X_{j}}}( >{u} )} \) has the distribution function \(\mathrm{F ( x \vert \mathit{u} ) =\frac{F ( x ) -F ( \mathit{u} ) }{1-F ( \mathit{u} ) } ( \mathit{u}\leq x \leq \bar{\omega} _{F} ) }\);  we will suppose \(\mathrm{F ( x ) }\) continuous.

In Chapter 2 we considered a sequence of thresholds (there called levels) such that \(\mathrm{n( 1-F ( \mathit{u}_{n} )) \rightarrow \tau~as~n \rightarrow + \infty }\), and sketched a brief way of obtaining the asymptotic distribution of maxima. In the same way, see Leadbetter, Lindgren and Rootzén (1983), we can prove that if \(\mathrm{n( 1-F ( \mathit{u}_{n} )) \rightarrow \tau~as~n \rightarrow + \infty }\) then

\(\mathrm{Prob \{{ \# \{{ X_{i}>u_{n} } \}\leq k }\} \rightarrow e^{- \tau} \sum _{s=0}^{k}\frac{ \tau^{S}}{S~!} }\),

showing asymptotically the Poissonian character of excesses of large thresholds in i.i.d. case. The exceedance may be considered a large storm, large earthquake, large flood, etc.

We will assume that the large exceedances over a level \(\mathrm{ x }\) are i.i.d. with a Poisson distribution with mean value \(\mathrm{ \beta ( 1-F ( x \vert \mathrm{u )} }\)  per cycle (e.g., 1 year), and so the mean value of exceedances of \(u\) per cycle is \(\mathrm{ v= \beta }\) as \(\mathrm{ F ( \mathit{u} \vert \mathit{u} )=0 }\)  and in \(\mathrm{ t }\)  cycles \(\mathrm{ \beta t }\) . Thus the level \(\mathrm{ x_ T}\) such that the mean value of exceedances in \(\mathrm{ T }\)  cycles is 1 is given by \(\mathrm{ 1= \beta T ( 1-F ( x_{T} \vert \mathit{u} ) ) }\)  or  \(\mathrm{ F ( x_{T} \vert \mathit{u} ) =1-1/ \beta T;\,x_{T} }\) is evidently the design value for a return period of   \(\mathrm{ T }\)  (for the excess variables) .

Also

\(\mathrm{ Prob \{ max\,X_{i}~in~t~cycles \leq x \} }\)

\(\mathrm{ =Prob \{{ \# \{{X_{i}>x }\} ~in~t~cycles~equal~to~0 }\} =e^{- \beta t ( 1-F ( x \vert \mathit{u} ) ) } }\).

Let \(\mathrm{x ( L,R) }\)  be the level to be exceeded with probability \(\mathrm{R }\)  in \(\mathrm{L }\)  cycles. In the same way we have

\(\mathrm{ 1-R=e^{- \beta L ( 1-F ( x ( L,R ) \vert \mathit{u} ) ) } }\)

or \(\mathrm{ F ( x ( L,R ) \vert \mathit{u} ) =1+\frac{log ( 1-R ) }{ \beta L} }\) which with \(\mathrm{ F ( x_{T} \vert \mathit{u} ) =1-\frac{1}{ \beta T} }\) leads to the non-parametric relation

\(\mathrm{ 1-R=e^{-L/T} }\)

which connects the design value \(\mathrm{ x_ T}\) for one exceedance on average in \(\mathrm{ T }\) cycles to the level \(\mathrm{x ( L,R) }\)  that can be exceeded in cycles with probability \(\mathrm{R }\) (the risk). Thus the two levels are connected and \(\mathrm{ x_ T}\) can be used for design, with a new interpretation.

Notice that the hypothesis that we were dealing with exceedances over large thresholds was necessary to justify the use of the Poisson approximation.

In fact exceedances cluster in general, and this Poisson distribution applies to sequenced approximately independent clusters (with convenient  \(\mathrm{\beta }\) ).

Let us now consider the peaks of each cluster and assume, as an example that has been useful, that \(\mathrm{F ( x ) }\)  is the exponential distribution and so \(\mathrm{ F ( x \vert u ) =1-e^{-{ ( x-u ) }/{ \delta }},x \geq u,~ \delta >0 }\), independent of cluster distribution. Then we have  \(\mathrm{ x_{T}=u+ \delta ~log ( \beta T ) }\).  

Thus if we have, in \(t_1\)  cycles, \(n_1\) clusters above \(u\) we get

\(\mathrm{ \hat{\beta} =n_{1}/t_{1} }\)

and if peaks of the clusters are \(\mathrm{ x_ i}\) we get

\(\mathrm{\hat{\delta} =\frac{1}{n_{1}} \sum _{1}^{n_{1}} ( x_{i}-u ) =\frac{1}{n_{1}} \sum _{1}^{n_{1}}x_{i}-u=\bar{x}-u }\).

Once more by the \(\mathrm{ \delta }\)-method, as \(\mathrm{ \hat{\beta} }\) is asymptotically normal with mean value \(\mathrm{ \beta }\) and variance \(\mathrm{ \beta /t_{1}, \hat{ \delta } }\) is asymptotically normal with mean value \(\mathrm{ \delta }\) and variance \(\mathrm{ \delta ^{2}/n_{1} }\)and zero covariance (independence), we get \(\mathrm{ \hat{x}_{T}=u+\hat{ \delta}\, log \,( \hat{\beta} T ) }\)  asymptotically normal with mean value \(\mathrm{ x_ T}\) and asymptotic variance \(\mathrm{ V ( \hat{x}_{T} ) \sim \frac{ \delta ^{2}}{ \beta ~t_{1}} ( 1+ ( log~ \beta T ) ) ) ^{2} }\).

Evidently the corresponding results can be obtained when using Gumbel and Fréchet distributions or distributions attracted to them.

We will finally deal with the A.E.O.T. method. We will use the papers by Smith (1984) and Davison (1984).

Let us assume an i.i.d. sample \(\mathrm{ \{ X_{i} \} }\) with distribution function \(\mathrm{F ( x ) }\), fix a threshold \(u\), and select only the exceedances \(\mathrm{ X_{i}> }u\). To a certain extent we are choosing, on average, the  \(\mathrm{ n ( 1-F ( \mathit{u} ) ) }\) largest values, and so the results are analogous to those concerning  \(\mathrm{ m= n ( 1-F ( \mathit{u} ) ) }\) largest values, dealt with in second section of this chapter. What is important, now, is that we are assuming that the largest values come from an i.i.d. sample which, even forgetting the seasonality, is not necessarily valid because large values tend to cluster and independence for large values is necessarily false. Note that in the P.O.T. method, what was assumed was the (practical) independence of clusters plus a stable distribution of the peaks of the exceedances, which is much more reasonable. Because of this, the treatment will necessarily be sketchy.

Here we will use some preasymptotic form, the generalized Pareto distribution; note that the \(\mathrm{ \theta }\) here corresponds to the \(\mathrm{ \theta }\) of the von Mises-Jenkinson from \(\mathrm{G(z| \theta) }\) and is symmetric to the \(\mathrm{ \theta }\) of the usual generalized Pareto distribution. The generalized Pareto distribution has the form

\(\mathrm{ G ( y/ \delta \vert \theta ) =1- ( 1+ \theta ~y/ \delta ) _{+}^{-1/ \theta } }\)       if   \(\mathrm{ \theta \neq 0 }\)

\(\mathrm{ =1-e^{-y_{+}/ \delta } }\)                                    if   \(\mathrm{ \theta = 0 }\)

where  \(\mathrm{ \delta = \delta ( u ) >0 }\)  and \(\mathrm{ - \infty< \theta <+ \infty }\);  also \(\mathrm{ 0 \leq y<+ \infty }\)  for \(\mathrm{ \theta \geq 0 }\)  and \(\mathrm{ 0 \leq y \leq - \delta / \theta }\)  for \(\mathrm{ \theta < 0 . }\)

Let \(\mathrm{ \bar{F} ( \Delta \vert u ) =F ( u+ \Delta \vert u ) }\)  for \(\mathrm{ 0 \leq \Delta \leq \bar{w}_{F}}-u\) be the distribution of the excess  \(\mathrm{ X-u }\)  over the threshold \(u\). Pickands (1975) proved that \(\mathrm{ F(.) }\) is attracted to \(\mathrm{ G \left( . \vert \theta \right) }\)  iff

\(\mathrm{\begin{array}{c} \\ \mathrm{ lim} \\ \mathrm{ {u \rightarrow \bar{w}_{F}~} } \end{array}​​\begin{array}{c} \\ \mathrm{ inf } \\ \mathrm{ {0< \delta <+ \infty} } \end{array}\begin{array}{c} \\\mathrm{ sup } \\ \mathrm{ { \Delta } } \end{array} \vert \bar{F} ( \Delta \vert u ) -G ( \Delta / \delta ( u ) \vert \theta ) \vert =0 }\),

and so the generalized Pareto distribution \(\mathrm{ G ( \Delta / \delta ( u ) \vert \theta ) }\) is an approximation to \(\mathrm{\bar{F} ( \Delta \vert u ) }\), the distribution of the excess.

Then assuming all the excesses \(\mathrm{{ X}}-u \)  to be independent and that  \(\mathrm{ \theta >-1/2 }\) , we can obtain the ML estimators \(\mathrm{( \hat{ \delta} , \hat{ \theta} ) }\) of \(\mathrm{( \delta , \theta ) }\) which are asymptotically normal with mean values \(\mathrm{( \delta , \theta ) }\) and the variance-covariance matrix asymptotically

\(\mathrm{\frac{1}{m}} \begin{bmatrix} \mathrm{2\,\delta^2(1+\theta) } & \mathrm{\delta(1+\theta) } \\[0.3em] \mathrm{\delta(1+\theta) } & \mathrm{(1+\theta)^2 } \\[0.3em] \end{bmatrix}\)

where \(m\)  is the number of exceedances over \(u\). We skip the results for  \(\mathrm{ \theta \leq-1/2 }\) .

Thus, once a high threshold \(u\) is fixed, assuming independence, the \(m\)  exceedances have a binomial distribution with exceedance (or survival) probability \(\mathrm{ S( u ) =1-F ( u ) }\). Then with the observed excesses \(\mathrm{ \Delta }\) assumed independent and with a generalized Pareto distribution we can conditionally on \(m\), obtain the ML estimators \(\mathrm{( \hat{ \delta} , \hat{ \theta} ) }\). An approximation to the design value \(\mathrm{ x_T }\) is then given by \(\mathrm{ F ( x_{T} ) }\) as \(\mathrm{ F ( x_{T}) =F ( u+ ( x_{T}-u ) ) -F ( u ) +F ( u ) \approx S ( u )\, G ( \frac{x_{T}-u}{ \delta ( u ) } \vert \theta ) +F ( u ) =1-1/T }\), as in the first part of the section, or equivalently with \(\mathrm{ x_{T}=u+ \Delta _{T} }\)  with \(\mathrm{ \Delta _T }\)  is given by

\(\mathrm{ G ( \Delta _{T}/ \delta ( u ) \vert \theta ) =1-\frac{1}{S ( u ) \,T} }\)

or  \(\mathrm{ \Delta _{T}=\frac{ ( \delta ( u ) T ) ^{ \theta }-1}{ \theta }~ \delta ( u ) }\) and thus \(\mathrm{ x_{T}=u+\frac{ ( \delta ( u ) T ) ^{ \theta }-1}{ \theta }~ \delta ( u ) }\) if \(\mathrm{ \theta \neq 0 }\) and \(\mathrm{ x_{T}=u+log~ ( S ( u ) \,T ) \, \delta ( u ) }\) if \(\mathrm{ \theta = 0 }\); clearly \(\mathrm{ \hat{x}_T }\) is asymptotically standard normal with mean value \(\mathrm{ x_T }\) and variance of \(\mathrm{ O(n^{-1}) }\). The last formula can be compared with \(\mathrm{x_{T}=u+log~ ( \beta T ) }\) given previously for the exponential distribution, which corresponds to the Gumbel limiting distribution \(\mathrm{ \left( \theta =0 \right) }\); for large \(\mathrm{ u, \beta }\) will be in general smaller than \(\mathrm{ 1 }\) and thus comparable with \(\mathrm{ S(u) }\).

This second part of the section needs a thorough revision owing to the hypothesis made.

For more details connected to tail estimation see Annex 4.

7 . Footnotes

(*)  From the practical point of view, as in the previous section, we are assuming that the \(\mathrm{ x_{ij} }\) are \(\mathrm{ i.i.d. }\)  and attracted to a Gumbel distribution and so their distribution can be approximated also by a Gumbel distribution. What is essential in the subsamples (or block) maxima method is that the maxima have a Gumbel distribution, even if the observations of the subsample are dependent.

(**) Notice that there are situations where only the maxima \(\mathrm{ y_{j} }\) per block (year) are recorded; in this case Weissman’s method is not applicable, only the classical one.

References

1.

Boos, D. D., 1983. Using extreme value theory to estimate large percentiles. Techn. Rep., Dept. Statist., North Carolina State University.

2.

David, H. A., 1981. Order Statistics (2nd ed), Wiley, New York.

4.

Diaz, Jean –Pierre., 1985. Estimation et prédiction de valeurs extrêmes dans le domaine d'atraction de la loi de Fréchet. C. R. Acad. Sc. Paris, t. 301, ser. I, 541-544.

5.

Gomes, M. I., 1981. An i-dimensional limiting distribution of largest values and its relevance to the statistical theory of extremes. Statistical Distributions in Scientific Work, 6, C. Taillie et al eds., 389-410, D. Reidel, Dordrecht.

8.

Gumbel, E. J., 1958. Statistics of Extremes, Columbia University Press, New York.

9.

Hall, P., 1982. On estimating the end -point of a distribution. Ann. Statist., 10, 556-568.

11.

Hüsler, J. and Schüpbach, M., 1986. On simple block estimators for the parameters of the extreme -value distribution. Commun. Statist.-Simul. Comput., 15, 61-76.

12.

Hüsler, J. and Tiago de Oliveira, J., 1988. The usage of largest observations far parameter and quantile estimation for Gumbel distribution; an efficiency analysis. Publ Inst. Statist. Univ. Paris, 33 (1), 41-56.

16.

Neves, M., 1986. Estimacäo de quantis das distribuicöes de Gumbel e Fréchet com parâmetros de escala e de localizacäo desconhecidos, baseados em duas estatísticas ordinais. Centro de Estatística e Aplicacöes, (25/86), Lisboa.

17.

Neves, M., 1988. Estimacäo por blocos dos parâmetros da distribuicäo de Fréchet. Universidade Nova de Lisboa, Tese de Doutoramento.

24.

Tiago de Oliveira, J, 1982. Efficient estimation for quantiles of Weibull distributions. Rev. Beige Statist. Rech. Oper., 22, 3-10.

25.

Tiago de Oliveira, J, 1984. Weibull distributions and large earthquake modeling.  Probabilistic Methods in the Mechanics of Solids and Structures (TUTAM Symposium in honour of Dr. W. Weibull), 81-89, S. Eggwertz and N.C. Lind eds., Springer –Verlag,  Heidelberg.

28.

Weissman, I., 1981. Confidence intervals for the threshold parameter-II: unknown shape parameters. Commun. Statist.- Theor. Meth., 11, 2461-2467.

29.

Weissman, I., 1981b. Confidence intervals for the threshold parameter, Commun. Statist.- Theor. Meth., A10, 549-557.