Annex-4

1.Academia das Ciências de Lisboa (Lisbon Academy of Sciences), Lisbon, Portugal.
1.Academia das Ciências de Lisboa (Lisbon Academy of Sciences), Lisbon, Portugal.
Annex
The estimation of the tail of a distribution, using extreme order statistics, sometimes referred as a possibility, has been approached in different ways. We will describe the essential features of these, with some comments and a comparison between them as well as some remarks about some open questions; we have drawn heavily on the review paper by Tiago de Oliveira (1991).
As there is complete duality between the maxima and the minima, and so between the right and left tail, we will only analyse the right tail estimation using the largest order statistics.
Under this umbrella two different problems can be dealt with:
a) estimation of the large quantiles (for probabilities close to 1)
and
b) estimation of the probability of overpassing some given level.
As was shown by Einmahl (1990), the problem of right tail estimation is only interesting when we are beyond (or near) the largest value of the sample, thus constituting a statistical extrapolation and, from this perspective, some hypothesis needs to be made. For statistical extrapolation we must suppose that the distribution function of the i.i.d. sample lies in the domain of attraction of some asymptotic distribution of maxima. On the other hand, if we are interested only in the behaviour of the distribution in the interior of the support of the sample, i.e., the closed interval [minimum, maximum], the utilization of the usual sample distribution function is completely sufficient and for this statistical interpolation we do not need to add any hypothesis except the usual (and convenient one) of continuity; in fact, it is a classical non-parametric approach.
Let us recall that a distribution function \(\mathrm{ F( z ) }\) is attracted, for maxima, to \(\mathrm{ G( z \vert \theta ) }\), noted \(\mathrm{ F ( .) \in ~\mathit{D} ( G( . \vert \theta) ) }\), if there exist attraction coefficients \(\mathrm{ \{ \lambda _{n}, \delta _{n} >0 ) \} }\), not uniquely defined, such that for an i.i.d. sample \(\mathrm{ ( x_{1},\dotsc x_{n} ) }\), with the distribution function \(\mathrm{ F ( x) }\), we have:
\(\mathrm{ P \{ ( max ( x_{1},\dotsc x_{n} ) - \lambda _{n} ) / \delta _{n} \leq z \} =P \{ \mathrm{ \begin{array}{c}\mathrm{ n} \\ \mathrm{ max} \\ \mathrm{ 1 } \end{array} } ( x_{i} ) \leq \lambda _{n}+ \delta _{n}~z \} }\) \(\mathrm{ =F^{n} ( \lambda _{n}+ \delta _{n}~z ) \rightarrow G(z \vert \theta ) }\)
as \(\mathrm{ n \rightarrow \infty }\); as the \(\mathrm{ G( z \vert \theta ) }\) are continuous the convergence is uniform. It should be noticed that if \(\mathrm{ F \left( . \right) \in ~\mathit{D} \left( \Phi _{ \alpha } \right) }\) we have \(\mathrm{ \bar{w}=+ \infty }\) and we can take \(\mathrm{ \lambda _{n}=0 }\) and if \(\mathrm{ F \in ~\mathit{D} ( \Psi _{ \alpha } ) }\) we have \(\mathrm{ \bar{w}<+ \infty }\) and we can take \(\mathrm{ \lambda _{n}= \bar{w} }\).
The necessary and sufficient conditions for attraction to one of the asymptotic distributions of maxima have been given before, the most practical being the one to be recalled in the description of the third approach although some others will appear in the other two approaches.
For some time the limiting relation \(\mathrm{ F^{n} (\lambda _{n}+ \delta _{n}~z ) \rightarrow G ( z \vert \theta ) }\) as \(\mathrm{ n \rightarrow \infty }\) led to the approximation \(\mathrm{F^{n}{ \left( \lambda _{n}+ \delta _{n}~z \right) \approx G \left( z \vert \theta \right) }}\) or \(\mathrm{ F^{n}( y ) \approx G ( ( x- \lambda) / \delta \vert \theta ) }\) where the index \({ n }\) was omitted because \(\mathrm{ \theta , \lambda }\) and \(\mathrm{ \delta }\) were to be estimated from a sample of maxima. This way gave an approximation to \(\mathrm{ F^{n} ( y ) }\), and sometimes \(\mathrm{ G^{1/n} ( ( y- \lambda ) / \delta \vert \theta ) }\), equal to \(\mathrm{ G( y- \lambda ’ ) / \delta ’ \vert \theta ) }\) by stability, was used as a (very) rough approximation to \(\mathrm{ F ( y ) }\). Thus the idea of using the largest observations of a sample to estimate \(\mathrm{ F ( y ) }\) came naturally. As a side remark, it should be noted that, for instance, if \(\mathrm{ F^{n} ( \lambda _{n}+ \delta _{n}~z ) \rightarrow \Lambda ( z ) }\) as \(\mathrm{ n \rightarrow \infty }\), for \(n\) not sufficiently large, a better approximation to \(\mathrm{ F^{n} ( y ) }\) can be obtained using, instead of \(\mathrm{ G ( ( y- \lambda ) / \delta \vert 0 )= \Lambda ( ( y- \lambda ) / \delta ) }\), a penultimate distribution function \(\mathrm{ G ( ( x- \lambda ) / \delta \vert \theta ) }\) [with \(\mathrm{ \lambda }\), \(\mathrm{ \delta }\) and \(\mathrm{ \theta }\) to be estimated from the sample of i.i.d. maxima]. Such is the case for the normal distribution when the criterion used for comparison of the asymptotic approximation is the maximum absolute error in the probability evaluation, i.e., when we try to minimize \(\mathrm{ \begin{array}{c}\mathrm{ } \\ \mathrm{ sup} \\ \mathrm{ y } \end{array} \vert N^{n}( \lambda _{n}+ \delta _{n}~y ) -G ( y \vert \theta ) \vert }\) where we have a sequence of \(\mathrm{ \theta = \theta _{n} \uparrow 0 }\). The global approach used here integrates this penultimate approximation naturally by estimating, from the sample of i.i.d. maxima, the \(\mathrm{ \theta }\), \(\mathrm{ \lambda }\) and \(\mathrm{ \delta }\) that fits the data better.
Before continuing an important remark must be made. When we are dealing with maxima only, in many cases, we can split the sequence of observations (the concrete time-series) into natural blocks for which the maxima are surely independent, thus having a sample of independent maxima; this was the case for various examples in this Part 2. These maxima, as was said for stationary sequences, for markovian sequences, for \(\mathrm{ m }\)-dependent sequences, for strongly mixing sequences and even for some evolutionary sequences, as will be shown in Part 4, have, under some more strong or mild conditions on association (many times, on the correlation), the same asymptotic distribution for maxima; the same happens for analogous continuous time stochastic processes for which we sometimes have records, many times discretized for the statistical analysis. The natural block used for hydrological data is the natural hydrological year, but for oceanographical and meteorological data the block used, in general, is the civil (!) year. This is the well-known “yearly method”. But if we use the largest values of the time-series it is obvious that the observations are not independent, not even stationary, because — even for an additive model — they have underlying the effects of trend and of cyclic variation, which have to be removed giving many times the largest values at other time points. Even when assumed to be stationary sequences, large (and also small) values occur by clusters and so in general the 2nd, 3rd, ... maximum occur at time points close to that of the (1st) maximum, as the association (correlation) is very strong; this drawback can be avoided by “cleaning” the time-series from the cluster effect, which means that once the 1st maximum at instant \({ t }\) is obtained, data should be “cleaned” from observations obtained in the instants (time steps) \({ t-m~~to~~t+m }\), and the 2nd maximum sought in the remaining data, etc.; this is the natural procedure if we are dealing with a stationary \(m\)-dependent sequence. For other situations something similar must be found. Another difficulty appears when we are using the largest values in a time-series split in the “wrong” blocks in which some number of largest values are to be used; if the block is, for instance, a civil year (and not a natural one like the hydrologic year), it may happen that a storm occurs by the end/beginning of the year and the maxima registered for the two sucessive years correspond to observations made at 9 p.m. on 31st December and to 3 a.m. on the following 1st January, which clearly, belong to the same cluster against the “cleaning” procedure sketched above, thus giving rise to errors in planning and design.
The “cleaning” procedure is thus essential to avoid errors; a good example of the “independentization” of observations in a time series was done by Dijk and de Haan (1990); the behaviour of the auto-correlation coefficient is sometimes a good tool to steer the procedure of choosing \(m\).
We will now consider a large sample of i.i.d. observations, which is supposed to have a distribution function \(\mathrm{ F( . ) } \) such that \(\mathrm{ F( . ) \in ~\mathit{D} ( G(. \vert \theta ) ) } \). This means, for practical applications, that the removal of the trend and cyclic variation and the cluster “cleaning” have been made. So far, it seems that only two situations exist where independence can be assumed directly: largest ages of death for men, for women or, for both, in some country if, in the years considered, no war, epidemic, national disaster or famine struck the region under study; and large fires data (with compensation for inflation) in some region if we discard fires with “epidemics” like forest fires, fires from bombing, and large fires in built-up areas. In these cases the “pooling” of data is natural, giving rise to a (very) large sample from which a relatively large set of largest values can be used to estimate tail properties. Possibly analogous cases exist but the conditioning to assume independence is very clear. Once the estimator of \(\mathrm{ F( . ) } \) is obtained, we can estimate either the desired exceedance probability or the quantile.
The first approach is due to Pickands (1975) and has been used with variations in Smith (1987), Smith and Weismann (1985), Smith (1989) and Pickands (1989). Basically it consists of the approximation of the asymptotics of the distribution to the excesses over a threshold by a generalized Pareto distribution which was obtained in the important Pickands (1975) paper. The method, then, corresponds to the P.O.T. (peaks over threshold) method sometimes used by engineers, against the yearly method.
The second approach is based on Weismann (1978) where the joint asymptotic distribution of a fixed number of the largest order statistics is used.
Both approaches have been used as if the asymptotic or preasymptotic distribution was the actual one, evidently raising the already classical criticism to the usage of \(\mathrm{ G ( ( y- \lambda) / \delta \vert \theta ) } \), with \(\mathrm{ \lambda } \), \(\mathrm{ \delta } \) and \(\mathrm{ \theta } \), estimated, for the actual distribution \(\mathrm{ F^{n} ( y ) } \) of the maximum of \(n\) observations when, in fact, both methods are approximations; note that the use of penultimate distributions referred to, is one way to compensate by using an approximation to the distribution of independent maxima. A guide to this approach was given by Tiago de Oliveira (1981) and (1984) where the theory of statistical choice for \(\mathrm{ \theta <0 } \) (Weibull), \(\mathrm{ \theta =0 } \) (Gumbel) or \(\mathrm{ \theta >0 } \)(Fréchet) was developed; this statistical choice can give a better way of approaching the description of the statistical behaviour of independent maxima, even when generated by a dependence process, as happens in the yearly method.
A third and new approach has been developed essentially by Dekkers and de Haan (1989), Dekkers, Einmahl and de Haan (1989) and Dijk and de Haan (1990); it is connected with some previous results for the estimation of the shape parameter for Fréchet distribution, such as the Hill estimator.
This approach, although steered by the asymptotic distribution of maxima, does not use it as the actual distribution, in contrast to the previous ones that, at some point, substitute the actual (unknown) distribution by its asymptotic approximation.
Notice that the initial assumption that all observations are i.i.d. and their distribution function is attracted to some asymptotic distribution of extremes \(\mathrm{ G ( z \vert \theta ) } \) leads to the use of upper order statistics — sometimes called extremal, sometimes intermediate, see Reiss (1989)—which are evidently dependent; independence thus is at the beginning but not in the observations choose for the statistical analysis.
The asymptotic normality of Pickands’ (1975) estimator of has been proved in various papers, such as that of Dekkers and de Haan (1989).
Complementary results can be found in Tiago de Oliveira ed., (1984), especially the papers by Weismann (1984), Smith (1984) and Davison (1984), and also in Davison and Smith (1990).
This Annex applies results of Chapters 3 and 9 for the estimation of large quantiles and small exceedance probabilities; but it must be remarked that optimal procedures for the choice of the number and the orders of large quantiles, the efficiency evaluation, the speed of convergence, etc., are yet open problems.
Let \(\mathrm{ \left( x_{1}, \dotsc x_{n} \right) } \) be a sample of i.i.d. random variables (observations) with distribution function \(\mathrm{ F ( x ) } \), attracted to some \(\mathrm{ G ( z \vert \theta ) } \) and let \(\mathrm{ \bar{w}=inf \{ x \vert F ( x ) =1 \} } \) be the right-end point of \(\mathrm{ F( . ) } \). As we are going to use the upper order statistics, let us recall that \(\mathrm{ {x}_{k}^{''} } \) denotes the \(\mathrm{ k } \)-th maximum, i.e., the ascending order statistics \(\mathrm{ x_{n+1-k}^{’}=x_{n+1-k:n}^{’} } \); the maximum is obviously \(\mathrm{ {x}_{1}^{''} } \) and the minimum is \(\mathrm{ {x}_{n}^{''} } \).
Recall that Pickands (1975) gave the name of standard generalized Pareto distribution to the distribution function \(\mathrm{ P( x \vert \theta ) =1- ( 1+ \theta ~x ) _{+}^{-1/ \theta } } \) where \(\mathrm{ P ( x \vert 0 ) } \), defined as \(\mathrm{ P ( x \vert 0^{+} ) =P ( x \vert 0^{-} )=1-e^{-x_{+}} } \), is the standard exponential distribution.
If \(\mathrm{ \theta \geq 0 } \) we have for \(\mathrm{ P ( x \vert \theta ) } \) the right-end point \(\mathrm{ \bar{w}=+ \infty } \) but for \(\mathrm{ \theta <0 } \) the right-end point is \(\mathrm{ \bar{w}=-1/ \theta } \); in general \(\mathrm{ \bar{w}=1/ ( - \theta ) _{+} } \) for the standard generalized Pareto distribution.
Evidently if \(\mathrm{X } \) has the distribution function \(\mathrm{ F ( x ) } \), the conditional survival function of the excess \(\mathrm{ X - u } \), when \(\mathrm{ X \geq u } \) , is given by
\(\mathrm{ S ( x \vert u ) =P(X>u+x \vert X>u ) =\frac{1-F ( u+x ) }{1-F ( u ) } } \) for \(\mathrm{ x \geq 0 } \)
and the conditional distribution function of the excess is evidently
\(\mathrm{ F ( x \vert u ) =1-S ( x \vert u ) =\frac{F ( x+u ) -F ( u ) }{1-F ( u ) } } \) for \(\mathrm{ x \geq 0 } \).
Pickands (1975) proved the following result:
\(\begin{array}{c} { } \\ { lim} \\ {u \uparrow \bar{w} } \end{array} \vert F ( x \vert u ) -P ( x \vert \delta ( u ) \vert \theta ) \vert =\begin{array}{c} { } \\ { lim} \\ {u \uparrow \bar{w} } \end{array} \vert S(x \vert u) - ( 1+ \theta ~x \vert \delta t( u ) ) _{+}^{-{1}/{ \theta }} \vert =0\)
iff \(F ( . ) \in ~D ( G ( . \vert \theta ) ) \).
Notice that the scale parameter \(\mathrm{ \delta ( u ) } \) depends on \(u\).
Also in the same paper it was proved that:
If \(m_{n} \) is an integer such that \(1 \leq 4~m_{n} \leq n~and~m_{n} \rightarrow \infty \) but \(m_{n} \vert n \rightarrow 0\)
then \(\theta ^{*}=log ( \frac{x_{m_{n}}^{''}-x_{2m_{n}}^{''}}{x_{2m_{n}}^{''}-x_{4m_{n}}^{''}} ) /log~2\)
and \(\delta ^{*}= \theta ^{*} ( x_{2m_{n}}^{''}-x_{4m_{n}}^{''} ) / ( 2^{ \theta ^{*}}-1 )\) are such that
\(\begin{array}{c} { } \\ { sup} \\ {x} \end{array} \vert F ( x \vert x_{4m_{n}}^{''} ) -P ( x \vert \delta ^{*} \vert \theta ^{*} ) \vert {\mathrm{\ }\stackrel{{a.s.}}{\rightarrow}}\,0 \) and also that \(\theta ^{*}{\stackrel{{P}}{\rightarrow}}\, \theta \).
This theorem shows that for the (large) random threshold \(\mathrm{ x_{4m_{n}}^{''} } \)the conditional distribution of the random excess \(\mathrm{ x- x_{4m_{n}}^{''} } \)is asymptotically approximated by a Pareto generalized distribution.
A rule for the choice of \(\mathrm{ m_{n} } \) is also given.
Later Dekkers and de Haan (1989) proved the asymptotically normality of \(\theta ^{*}\) (see the third approach, first result, of this Annex).
Smith (1987) (with a change of sign in \(\theta \) ) used Pickands (1975) result with either a moving or a random threshold \(u_{n}\). Evidently, conditional on a given \(u\), the excesses \( {\mathrm{x_{i}}-u}\) are independent random variables and its number (the number of exceedances of \(u\) ) is random (a binomial variate if \(u\) is not random). Thus \(u\) should not be neither too large to have few exceedances or too small to have almost all the sample giving rise to exceedances. Then the excesses are taken as having, as actual distribution, a generalized Pareto one, and estimation is made by the method of maximum likelihood which is only regular if \(\mathrm{ \theta >-1/2 } \), with an extension if \(\mathrm{ \theta \leq-1/2 } \). Let us denote by \(\mathrm{ ( \hat{ \theta} , \hat{\delta} ) } \) the maximum likelihood estimators. Then Smith (1987), assuming that the excesses have an actual generalized Pareto distribution, shows, with \(N\) denoting the random number of excesses, that:
\(\sqrt{\mathrm{N}} (\hat{\delta} \vert \delta -1, \hat{\theta} - \theta ) \)is asymptotically binormal with zero mean values and variance-covariance matrix \(V= ( 1+ \theta) = \begin{bmatrix} 2 & 1 \\[0.3em] 1&1+\theta \end{bmatrix}\) which is the Cramér-Rao bound.
It is then shown that if the excesses do not have the actual generalized Pareto distribution but
\(\mathrm{ F(x \vert u ) -P ( x/ \delta ( u ) \vert \theta ) =O ( 1/\sqrt[]{N} ) } \) then
\(\sqrt{\mathrm{N}} (\hat{\delta} \vert \delta -1, \hat{\theta} - \theta ) \) is asymptotically binormal with mean values not necessarily zero but with the same variance-covariance matrix \(V \).
The non-zero mean values allow for the existence of a bias in the approximation; but knowledge of this bias depends on the structure and the form of the tail \(\mathrm{ 1- F ( x ) } \). The study is made for each form of the asymptotic distribution (Weibull, Gumbel or Fréchet) in particular, with additional conditions.
The distribution function for values \(\mathrm{ y=u+x,x \geq 0 } \), naturally using \(\mathrm{ 1-\hat{F} ( u ) =N/n } \), is then estimated by \(\mathrm{ 1-\hat{F} ( u+x ) =\frac{N}{n} ( 1+\hat{ \theta} ~x/ \hat{\delta} ) _{+}^{-1/ \hat{\theta} } } \)or \(\mathrm{ \hat{F} ( y ) =1-\frac{N}{n} ( 1+ ( ( y-u ) / \hat{\delta} ) _{+}^{-1/ \hat{ \theta} } ) } \), for \(\mathrm{ y \geq u } \) which is asymptotically normal by the use of the \(\mathrm{ \delta } \)-method; see Tiago de Oliveira (1982).
The quantile for probability \(\mathrm{ p=1- \varepsilon } \) is estimated by solving back the equation \(\mathrm{ \hat{F} ( y ) =1- \varepsilon } \) and so is given by \(\mathrm{ \hat{y} ( 1- \varepsilon ) =u+ \hat{\delta }[ ( \frac{n~ \varepsilon}{N} ) ^{- \hat{\theta} }-1 ] } \)which is also asymptotically normal by the \(\mathrm{ \delta } \)-method.
The choice of the threshold \(u_n\)(random or moving) and the way that \({u}_{\mathrm{n}} \uparrow \bar{\mathrm{w}} \) is partially discussed, distribution by distribution, using some additional assumptions on the type of variation of the tail \(\mathrm{ 1-F ( y ) } \).
Here are some examples:
If \(\mathrm{ F \in ~\mathit{D} ( \Phi _{ \alpha } ) } \), that is iff \(\mathrm{ \bar{w}=+ \infty } \) and \(\mathrm{ L ( x ) =x^{ \alpha } ( 1-F ( x ) ) } \) is of slow variation (i.e., \(\mathrm{ \frac{L( x~t) }{L( x ) } \rightarrow 1~when~x \rightarrow \infty } \)) such that \(\mathrm{ L ( t~x ) /L ( x ) =1+k ( t ) \phi ( x ) +0 (\phi ( x ) ) } \), where \(\mathrm{ k ( t ) =\frac{C}{ \rho } ( t^{ \rho }-1 ) ( \rho <0 ) } \), then the previous asymptotic result is valid with the mean values \(\mathrm{( \frac{ \mu ( 1+ \theta ) ( 1+2~ \theta ~ \rho ) }{1+ \theta - \theta ~ \rho },\frac{- \mu ( 1+ \theta ) {\theta( 1+ \rho )} }{1+ \theta - \theta ~ \rho } ) }\) and variance-covariance matrix \(\mathrm{ V } \) given; here we denote by \(\mathrm{ \mu=p\, lim~\frac{\sqrt{\mathrm{N}}~C~\phi \left( u N \right) }{ \alpha - \rho } } \). Recall that \(\mathrm{ \theta =1/ \alpha } \) and \(\mathrm{ \delta ( u ) =u/ \alpha } \).
The Hill estimator for \(\mathrm{ \alpha } \) is given by
\(\mathrm{ \alpha _{n}^{*}=\frac{N}{ \sum _{1}^{N}log\,x_{i}^{''}-N~log~u} } \)
and the asymptotic ratio of \(\mathrm{ \frac{MSE ( \hat{\alpha} _{n} ) }{MSE ( \alpha _{n}^{*} ) } } \) is
\(\mathrm{ ( \alpha +1 ) ^{2} ( \frac{1+ \rho }{ \alpha +1- \rho } ) ^{2 \alpha / ( \alpha -2 \rho ) } } \),
which shows, that according to the value of \(\mathrm{ \rho }\), one or other of the estimators is better.
The choice of a deterministic sequence \(\mathrm{ \{ u_{n} \} }\) must be such that \(\mathrm{ n ( 1-F ( u_{n} ) ) \rightarrow \infty,{ \sqrt{\mathrm{n ( 1-F ( u_{n} ) ) }}~C~\phi ( u_{n} ) ) \rightarrow \mu ( \alpha - \rho ) }}\), a result of which is of doubtful usefulness because \(\mathrm{ F ( x ) }\) is generally unknown, as well as \(\mathrm{ \mu }\) and \(\mathrm{ \alpha }\); in the case where \(\mathrm{ u }\) is a random threshold we choose a sequence of \(\mathrm{ N_{n} }\)such that \(\mathrm{ N_{n} \rightarrow \infty,N_{n}/n{\mathrm{\ }\stackrel{\mathrm{P}}{\rightarrow}}\,0}\) and use \(\mathrm{ u_{n}=x_{N_{n}+1}^{''}. }\)
The case for \(\mathrm{ \Psi _{ \alpha } }\), when \(\mathrm{ \alpha>2 }\), has some similarities with the previous one. The necessary and sufficient condition to be attracted to \(\mathrm{ \Psi _{ \alpha }(x) }\) is that \(\mathrm{ \bar{w}<+ \infty }\) and \(\mathrm{ L ( x ) =x^{ \alpha } ( 1-F ( \bar{w}-1/x) ) }\)is also of slow variation.
Assuming the same condition as before, with \(\mathrm{ \rho <0 }\), and defining \(\mathrm{ \mu }\) by \(\mathrm{ \mu =lim\sqrt{\mathrm{N}}\,C(\phi ( ( \bar{w}-u_{n} ) ^{-1} ) /( \alpha - \rho ) }\), we see that it is also valid with the mean values
\(\mathrm{ ( \frac{ \mu ( 1+ \theta ) ( 1+2~ \theta ~ \rho ) }{1+ \theta + \theta ~ \rho },\frac{- \mu ~ \theta ( 1+ \theta ) ( 1- \rho ) }{1+ \theta + \theta ~ \rho } ) }\)
and also the same variance-covariance matrix \(\mathrm{ V } \). We have, then, \(\mathrm{ \theta =-~1/ \alpha }\) and \(\mathrm{ \delta = ( \bar{w}-u ) / \alpha }\). This result is essentially due to Smith and Weismann (1985).
The estimator of the right-end point \(\mathrm{ \bar{w} }\) is given by \(\mathrm{ \hat{w}=u_{n}+ \hat{\alpha} ~ \hat{\delta} =u_{n}- \hat{ \delta} / \hat{ \theta } }\) which is also asymptotically normal. The random threshold \(\mathrm{ u_{n}=x_{N_{n}+1}^{''} }\) must be such that \(\mathrm{ \bar{w}-u_{n}=O_{p} ( ( N/n ) ^{1/ \alpha } ) =O_{p} ( ( n/N ) ^{ \theta } ) }\), but \(\mathrm{ \theta }\) is unknown except for the bounds \(\mathrm{ -1/2< \theta <0 }\) .
The cases for \(\mathrm{ \alpha ~ \in ] 0,2 ] }\) are analogous but more complex.
Now consider the asymptotic distribution \(\mathrm{ \Lambda }\). Although three approaches are presented in Smith (1987), we will describe only the second one because it corresponds to the P.O.T. methods used by engineers, which directly assume \(\mathrm{ ~ \theta =0 }\). The estimator of \(\mathrm{ \delta }\) is then \(\mathrm{ \hat{\delta} =\frac{1}{N} \sum _{1}^{N}log\,{x}_{i}^{''}-u_{n} }\)where the \(\mathrm{ {x}_{i}^{''} }\) are the order statistics larger than \(\mathrm{ u_{n} }\). Then for the estimator of the distribution function
\(\mathrm{\hat{F} ( y ) =1-\frac{N}{n}exp\, ( \frac{y-u_{n}}{ \hat{\delta }} ) }\)
it can be shown that
\(\mathrm{ \sqrt[]{N} ( \frac{1- \hat{F} ( u_{n}+z~\phi ( u_{n} ) ) }{1-F ( u_{n}+z~\phi( u_{n} ) ) }-1 ) }\)
is asymptotically normal with mean value \(\mathrm{ \mu z-\frac{ \mu z^{2}}{2} }\) and variance \(\mathrm{ 1+z^{2} }\), where we define \(\mathrm{ \mu }\) by \(\mathrm{ \mu =P\,lim~\sqrt[]{N}~\phi ( u_{n} ) /n }\) and \(\mathrm{ \phi(t) }\) is such that
\(\mathrm{ 1-F ( x) =c ( x ) exp( - \int _{- \infty}^{x}\frac{d~t}{\phi ( t ) }) ,x<\bar{w} }\)
with \(\mathrm{ c ( x ) \rightarrow 1~as~x \rightarrow \bar{w},\phi } \), is a positive differentiable function, and \(\mathrm{ \phi’ ( x ) \rightarrow 0~as~x \rightarrow \bar{w} } \).
Notice that, in this result, we have an asymptotic evaluation of the relative error of the approximation by the exponential distribution. The fact that the result, once more, depends on knowledge (at least global) of the initial \(F\) leads to difficulties.
This approach stems essentially from Weissmann’s (1978) paper and assumes that the member of upper order statistics used is fixed, although the sample size increases indefinitely. The sample \(\mathrm{ ( x_{1},\dots,x_{n} ) }\) is assumed, once more, to be i.i.d. with a distribution function which is in the domain of attraction of one of the three asymptotic distributions supposed known, analogously to what happened before.
Let us then suppose that \(\mathrm{ F~ \in \mathit{D }\left( \Lambda \right) }\), i.e., that \(\mathrm{ F^{n} ( \lambda _{n}+ \delta _{n}~x ) \rightarrow \Lambda ( x ) }\). Denoting by \(\mathrm{ y_{i,n}= ( x_{i}^{''}- \lambda _{n} ) / \delta _{n}=y_{i} }\) for short, it can be shown that:
The vector \(( y_{1},\dots,y_{m} ) \), if \(F~ \in D \, { (\Lambda)}\) with attraction coefficients \(\lambda _{n} \) and \(\delta _{n} ( >0 ) \), has the limiting density \(exp~(-e^{-ym}-\sum^{m}_{1}\,y_{i})~ with~ y_{1} \geq y_{2} \geq \dots \geq y_{m} \).
Thus we can approximate the actual distribution of the \(\mathrm{ x_{i}^{''}~ }\) \(\mathrm{ ( or~y_{i} ) }\)\(\mathrm{( i=1,~2,\dots,m) }\) by the asymptotic one, then having the likelihood
\(\mathrm{ \frac{1}{ \delta ^{m}}exp ( e^{-\frac{x_{m}^{''}- \lambda }{ \delta }}- \sum _{1}^{m}\frac{x_{i}^{''}- \lambda }{ \delta }) }\).
The random pair \(\mathrm{ ( \sum _{1}^{m}x_{i}^{''},x_{m}^{''} ) }\) is a sufficient statistic for \(\mathrm{ ( \lambda , \delta) }\) and the maximum likelihood estimators \(\mathrm{ ( \hat{ \lambda} , \hat{\delta }) }\) and minimum variance unbiased estimators \(\mathrm{ ( {\lambda} ^{*}, {\delta} ^{*} ) }\) of \(\mathrm{ ( \lambda , \delta) }\) are
\(\mathrm{ \hat{\delta} =\frac{1}{m} \sum _{1}^{m}x_{i}^{''}-x_{m}^{''}, \hat{\lambda} =x_{m}^{''}-log~m \cdot \hat{ \delta} }\)
and
\(\mathrm{ \delta ^{*}=\frac{1}{m-1} \sum _{1}^{m-1}x_{i}^{''}-x_{m}^{''}=\frac{m}{m-1} \delta ^{*}, \lambda ^{*}=x_{m}^{''}+ \left( S_{m}- \gamma \right) \delta ^{*} }\)
with \(\mathrm{ S_{m}= \sum _{1}^{m-1}\frac{1}{i} }\). We have \(\mathrm{ V ( \delta ^{*} ) =\frac{ \delta ^{2}}{m-1} }\) and
\(\mathrm{ V ( \delta ^{*} ) =[ \frac{ ( S_{m}- \gamma ) ^{2}}{m-1}+S_{m}^{’}] \delta ^{2} }\) where \(\mathrm{ {S}_{m}^{’}=\frac{ \pi ^{2}}{6}- \sum _{1}^{m-1}\frac{1}{i^{2}} }\).
By a result from Mejzler (1949), also given by de Haan (1971), we have, for the quantile \(\mathrm{ F^{-1} ( 1-c/n ) }\), the basic relation: \(\mathrm{ ( F^{-1} ( 1-c/n ) - \lambda _{n} ) / \delta _{n} \rightarrow log~c }\)which shows that for \(\mathrm{ \lambda _{n}~ }\) we can take \(\mathrm{ F^{-1}( 1-1/n ) }\), and for \(\mathrm{\delta _{n}=F^{-1} ( 1-e/n ) -F^{-1} ( 1-1/n ) }\), as known since Gnedenko’s (1943) paper. Thus
The quantile \(F^{-1} ( 1-c/n ) \), if \(F~ \in D \, { (\Lambda)}\), can be estimated by \(\hat{F}^{-1} ( 1-c/n ) = \hat{\lambda} _{n}+\hat{ \delta} _{n}~log~c\)or by \(F^{*-1} ( 1-c/n ) = \lambda _{n}^{*}+ \delta _{n}^{*}\,log~c\).
In the case where \(\mathrm{ F~ \in \mathit{D} ( \Phi _{ \alpha } ) }\), as previously stated \(\mathrm{ \bar{w}=+ \infty }\) and we can take \(\mathrm{ \lambda _{n}=0~ }\)and as \(\mathrm{ \Phi _{ \alpha } ( z ) = \Lambda ( \alpha ~log~z ) ~ ( z>0 ) ,log~X ( X>0 ) }\) is attracted to \(\mathrm{ \Lambda }\). Reducing to the previous case and afterwards returning to the observed values we have
If \(\mathrm{ F~ \in \mathit{D} ( \Phi _{ \alpha } ) }\), ( \( \alpha \) unknown) we have
\(\hat{ \alpha }= ( \frac{1}{m} \sum _{1}^{m}log\,x_{i}^{''}-log\,x_{m}^{''} ) ^{-1} \) and \(\hat{F}^{-1} ( 1-c/n ) = ( m/c ) ^{1/ \hat{\alpha} }~x_{m}^{''} \).
For the case \(\mathrm{ F~ \in \mathit{D} ( \Psi _{ \alpha } ) }\) we know that \(\mathrm{ \bar{w}<+ \infty }\) and \(\mathrm{ \Psi _{ \alpha } ( z ) = \Lambda ( - \alpha ~log ( -z ) ) }\) for \(\mathrm{ z<0 }\) and then
If \(\bar{w} \) is known and \(\mathrm{ F~ \in \mathit{D} ( \Psi _{ \alpha } ) }\), then we have
\(\hat{\alpha} = ( log ( \bar{w}-x_{m}^{''} ) -\frac{1}{m} \sum _{1}^{m} ( log~\bar{w}-x_{i}^{''} ) ) ^{-1}\)and
\(\hat{F}^{-1}( 1-c/n ) =\bar{w}-( \frac{c}{m}) ^{{1}/{ \hat{\alpha} }}x_{m}^{''} ( \bar{w}-x_{m}^{''} ) \).
The case where \(\mathrm{ \bar{w} }\) is unknown was not considered.
As stated previously, the third approach is different from the previous ones, in that it does not use the approximation given by an asymptotic or preasymptotic distribution as the actual one, as was done in the first and second approaches in one way or another.
The essential result for large quantile estimation, is based in the attraction conditions using the quantile function \(\mathrm{ Q( v ) =F^{-1}( v ) }\), the inverse function \(\mathrm{ F^{-1} ( v ) =inf \{ x \vert F ( x ) \geq v \} }\), given by Mejzler (1949) and de Hann (1970). We have the basic result:
\( F( . ) \in D ( G. \vert \theta )\) iff \(\frac{Q ( 1-t~x ) -Q( t ) }{Q ( 1-t~y ) -Q ( t ) } \rightarrow \frac{x^{- \theta }-1}{y^{- \theta }-1}\)
\(( \frac{log\,x}{log\,y}~if~ \theta =0) \) locally uniformly when \(t \rightarrow 0. \)
Let us denote by \(\mathrm{ F_{n} ( x ) }\) the sample distribution function and by \(\mathrm{ Q_{n} ( v) =F_{n}^{-1} ( v ) }\) the sample quantile function, which are both in correspondence with \(\mathrm{ F ( x ) }\) and \(\mathrm{ Q(v) }\). It is immediate that \(\mathrm{ Q_{n} ( 1-m_{n}/n) =x_{m_{n}}^{''} }\). Thus if we are seeking the \(\mathrm{ p }\)-quantile of \(\mathrm{ F ( x ) }\), i.e., the solution of \(\mathrm{ F ( x ) = \rho =1- \varepsilon }\), this is equivalent to seeking \(\mathrm{ Q ( 1- \varepsilon ) }\). Evidently if we want to design a dam with a probability \(\mathrm{ \varepsilon ’ }\) of being overpassed for \(\mathrm{ n’ }\) units of time under the i.i.d. hypothesis, we have to solve the equation \(\mathrm{ F^{n’}( x ) =1- \varepsilon ’ }\) or \(\mathrm{ F( x ) = ( 1- \varepsilon ’ ) ^{1/n’} }\), that is use \(\mathrm{ Q ( 1- \varepsilon ) }\) where \(\mathrm{ \varepsilon =1- ( 1- \varepsilon ’ ) ^{1/n’} }\)is much closer to zero as \(\mathrm{ \varepsilon ’ }\) is small and \(n’\) large.
Our purpose is thus to estimate \(\mathrm{ Q ( 1- \varepsilon ) }\) from a sample of \(n\) i.i.d. observations whose distribution function is attracted to \(\mathrm{ G ( . \vert \theta) }\) using the some largest order statistics.
As we can write
\(\mathrm{ Q ( 1- \varepsilon ) =\frac{Q ( 1- \varepsilon ) -Q ( 1-\frac{m_{n}}{n} ) }{Q ( 1-\frac{m_{n}}{n} ) -Q ( 1-\frac{2~m_{n}}{n} ) }[ Q ( 1-\frac{m_{n}}{n} ) -Q ( 1-\frac{2~m_{n}}{n} ) ] +Q ( 1-\frac{m_{n}}{n} ) }\)
replacing \(\mathrm{ Q }\) by \(\mathrm{ Q_{n} }\) where convenient, as \(\mathrm{ \frac{m_{n}}{n} \rightarrow 0 }\) by, hypothesis, and, substituting the first quotient by its limit given by the attraction conditions, as \(\mathrm{ \varepsilon =\frac{~n}{m_{n}}~\frac{m_{n} \varepsilon }{n} }\),
\(\mathrm{ Q_{n}^{*} ( 1- \varepsilon ) =\frac{{ ( m_{n}}/{ \varepsilon ~n ) ^{ \theta _{n}^{*}}-1}}{1-2^{- \theta _{n}^{*}}} [ x_{m_{n}}^{''}-x_{2m_{n}}^{''} ] +x_{m_{n}}^{''} }\)
where \(\mathrm{ {\theta}_{n}^{*} }\) is the estimate of \(\mathrm{ \theta }\) given by Pickands (1975).
\(\mathrm{ Q_{n}^{*}=log~\frac{{x}_{m_{n}}^{''}-{x}_{2m_{n}}^{''}}{{x}_{2m_{n}}^{''}-{x}_{m_{n}}^{''}}/log~2 }\).
Only three largest order statistics (intermediate as \(\mathrm{ m_{n} \rightarrow \infty }\) also, but at a slower rate) are sufficient to deal directly with the estimation of the quantile of probability \(\mathrm{ 1- \varepsilon }\).
Then Dekkers and de Haan (1989) show the following results:
If \({Q \left( t \right) }\) has a positive derivative and a positive function \({b \left( t \right) }\) exists such that
\(t^{1+ \theta }\frac{x^{1+ \theta }\,Q ( 1-t~x ) -Q ( 1-t ) }{b ( t ) } \rightarrow \pm ~log~x~as~t \rightarrow 0\)
then \(F ( . ) \in D ( G. \vert \theta ) \) and \(\sqrt[]{m_{n}} ( Q_{n}^{*}- \theta )\) is asymptotically normal with mean value zero and variance \(\theta ^{2}( 2^{2 \theta +1}+1 ) / ( 2 ( 2^{ \theta }-1 ) ~log~2 ) ^{2}\) for sequences \(m_{n} \rightarrow \infty \) such that \(m_{n}=0 ( n/g^{-1} ( 1 / n ) ) \) with \(g ( t ) =t^{1+2 \theta }\,Q’ ( 1-t) ^{2}/b^{2} ( t ) \) (when \(\theta =0 \) the asymptotic variance is \(\frac{3}{4 \left( log~2 \right) ^{4}} \) as follows from l’Hôpital’ s rule).
This allows us to conclude the consistency of \(\mathrm{ {\theta}_{n}^{*} }\). As we are dealing with intermediate order statistics that are asymptotically normal — see Reiss (1989) — \(\mathrm{ Q_{n}^{*} ( 1- \varepsilon ) }\) is expressed in them and also in \(\mathrm{ {\theta}_{n}^{*} }\), all asymptotically normal, by the \(\mathrm{ \delta }\)-method we can expect that \(\mathrm{ Q_{n}^{*} ( 1- \varepsilon ) }\) is asymptotically normal. In fact we have
If \(Q’ \left( t \right) \) exists and \(t^{1+ \theta }~Q’ \left( 1-t \right) \) is of slow variation when \(t \downarrow 0 \) then
\(\sqrt[]{2m_{n}}~\frac{{x}_{m_{n}}^{''}-Q ( 1- \varepsilon _{n} ) }{{x}_{m_{n}}^{''}-{x}_{2m_{n}}^{''}}\)
is asymptotically normal with mean value zero and variance \(\theta ^{2}\,2^{2 \theta +1} / ( 2^{ \theta }-1 ) ^{2}\) if \(~ \varepsilon _{n} \rightarrow 0~ \)and \(m_{n}= [ n~ \varepsilon _{n} ] \)(when \(\theta =0 \) we have for the variance \(2 / ( log~2 ) ^{2} ) \) .
This results allows us as before to obtain confidence intervals for the quantile of probability \(\mathrm{ 1- \varepsilon _{n} }\).
In Dekkers, Einmahl and de Haan (1989) there was introduced a general estimator of \(\mathrm{ \theta }\) also asymptotically normal that can be used for all values of \(\mathrm{ \theta }\), an estimator which is a modification of Hill’s estimator only applicable for \(\mathrm{ \theta>0 }\). Denoting by
\(\mathrm{ M_{n}=\frac{1}{m_{n}} \sum _{1}^{m_{n}} ( log\,{x}_{i}^{''}-log\,x_{{m}_{n}}^{''} ) ^{2} }\),
it is shown that
If \(Q’ ( t ) \) exists and \(t^{1+ \theta }~Q’ ( 1-t ) \) is of slow variation when \(t \downarrow 0 \) then
\(\sqrt[]{m_{n}}\frac{{x}_{{m}_{n}}^{"}-Q ( 1- \varepsilon _{n} ) }{{x}_{m_{n}}^{"} \cdot M_{n}}\)
is asymptotically normal, with mean value zero and variance \(( 1-min ( 0, \theta )) ^{2} \) if \(\varepsilon _{n} \rightarrow 0,~n ~\varepsilon _{n} \rightarrow \infty \) and \(m_{n}= [ n~ \varepsilon _{n} ] \).
Let us finally study the estimator of the exceedance probabilities.
In Dijk and de Haan (1990) the problem is approached in a somewhat similar way to the generalized Pareto distribution. Consider a level \(\mathrm{ y_{n} }\) such that, with an i.i.d. sample from a population with distribution function \(\mathrm{ F ( . ) \in \mathit{D} ( G. \vert \theta ) }\), we have \(\mathrm{ n ( 1-F ( y_{n} ) ) \rightarrow 0 }\). As is well known, the attraction of \(\mathrm{ F ( . ) }\) to \(\mathrm{ G(. \vert \theta ) }\) is equivalent to \(\mathrm{ n ( 1-F ( \lambda _{n}+ \delta _{n}~z ) ) \rightarrow ( 1+ \theta ~z ) _{+}^{-1/ \theta } }\). With \(\mathrm{ m_{n} \rightarrow \infty,~\frac{m_{n}}{n} \rightarrow 0 }\), we have also\(\mathrm{ \frac{n}{m_{n}} ( 1-F ( \lambda _{n/m_{n}}+ \delta _{n/m_{n}}~z ) ) \rightarrow ( 1+ \theta ~z ) _{+}^{-1/ \theta } }\), and so
\(\mathrm{ 1-F ( y_{n} ) \approx \frac{m_{n}}{n} ( 1+ \theta \frac{y_{n}- \lambda _{n/m_{n}}}{ \delta _{n/m_{n}}} ) _{+}^{-1/ \theta } }\)
which suggests the estimator
\(\mathrm{ F_{n}^{*} ( y_{n} ) =1-\frac{m_{n}}{n} ( 1+ \theta _{n}^{*}\frac{y_{n}- \lambda _{n/m_{n}}}{ \delta _{n/m_{n}}} ) _{+}^{-1/ \theta _{n}^{*}} }\)
when we assume \(\mathrm{ y_{n}= \lambda _{n/m_{n}}+ \delta _{n/m_{n}}\frac{c_{n}^{ \theta }-1}{ \theta } }\)
(note that \(\mathrm{c_{n} \sim \frac{m_{n}}{n ( 1-F ( y_{n} ) ) } }\)).
Then the asymptotic normality of \(\mathrm{ { F}_{n}^{*} ( y ) }\) is proved.
Let us put \(\mathrm{ q \theta ( x ) = \int _{1}^{x}s^{ \theta -1} \sim log~s~~d~s [\sim ( x^{ \theta } \sim log~x ) / \theta ~if~ \theta >0,\sim ( log\,{x ) ^{2}}/{2}\, if~ \theta=0\, \mathrm{and \sim{1}/{ \theta ^{2}}\,if~ \theta <0 ] ,c_{n}^{*}=\frac{m_{n}}{n ( 1-F_{n}^{*} ( y_{n} ) ) }} }\)and denote by \(\mathrm{ {\theta }_{n}^{*} }\) the estimator of \(\mathrm{ \theta }\) given in Dekkers, Einmahl and de Haan (1989). Then we can prove that:
If \(m_{n} \rightarrow \infty,{m_{n}}/{n} \rightarrow \infty,\frac{q\, \theta ( c_{n} ) }{c_{n}^{ \theta }\sqrt[]{m_{n}}} \rightarrow 0\),
\(\frac{\sqrt[]{m_{n}}}{q\, \theta ( c_{n} ) } ( \frac{c_{n}^{ \theta }-1}{ \theta } ) ^{2} \begin{array}{c} { } \\ { sup} \\ { {u \geq \lambda _{ {n}/{m_{n} }}} } \end{array} \vert 1+ \theta +\frac{ ( 1-F ( u ) ) F^{"} ( u ) }{F^{’ ( u ) ^{2}}} \vert \rightarrow 0\)
as \(n \rightarrow \infty, \) and the conditions that allow the estimation of \({\theta }\) by \({\theta }_{n}^{*}\) are valid then
\(\frac{\sqrt[]{m_{n}}~c_{n}^{*~ \theta _{n}^{*}}}{q_\,{ \theta _{n}^{*}} ( c_{n}^{*} ) } [\frac{1-F ( y_{n} ) }{ 1-F_{n}^{*} ( y_{n} ) }-1 ]\)
is asymptotically normal with mean value zero and variance \(1+ \theta ^{2}~if~ \theta \geq 0 \), \(\frac{ \left( 2-3~ \theta \right) ^{2}~}{1-2~ \theta }+\frac{ \left( 1-2~ \theta \right) ^{2} \left( 5-11~ \theta \right) }{ \left( 1-3~ \theta \right) \left( 1-4~ \theta \right) }-\frac{4 \left( 2-3~ \theta \right) ~ \left( 1-2~ \theta \right) }{1-3~ \theta }~if~ \theta <0\).
The choice of \( \mathrm{m_{n}}\) is strongly dependent on the initial distribution function \(\mathrm{ F }\); for instance for a Cauchy distribution we must have \(\mathrm{ m_{n}=O ( n^{4/5} ) }\) but for the normal distribution we must have \(\mathrm{ m_{n}=O (( log\,n) ^{2} ) }\).
Although the results contained in the third approach may be difficult to utilize with regard to the estimation of the exceedance probability, this one seems to be the most appropriate procedure as it only assumes the existence of one asymptotic distribution of maxima to the actual distribution, all observations supposed i.i.d., instead of using the asymptotics directly and afterwards, when possible, correcting for bias. Although for this procedure there is a study of optimal \(\mathrm{ m_{n} }\) by Dekkers and de Haan (1991), the results are so different that there does not seem to exist a general rule to be used universally. To compare the first and the third method, one way could be the comparison of the asymptotic MSE’s. In fact as the MSE for the first method is of the order of \(\mathrm{ \frac{A}{N} }\) or, as \(\mathrm{ N \sim n ( 1-F ( u_{n} ) ) }\), of the order of \(\mathrm{ \frac{A}{n( 1-F( u_{n} ) ) } }\), and for the third method the MSE (for the first \(\mathrm{ \theta ^{*} }\) the MSE equals the variance) is of the order of \(\mathrm{ \frac{~B}{m_{n}} }\) (the exact values of the constants \(\mathrm{ A}\) and \(\mathrm{ B }\) are given in the text), we have \(\mathrm{ \frac{MSE_{1}}{MSE_{3}}=O ( \frac{m_{n}}{n( 1-F ( u_{n} ) ) } ) }\). But, in fact, comparison is difficult as the optimal (or approximately optimal) \(\mathrm{ u_{n} }\) in the first method depends on the assumed or supposed behaviour of \(\mathrm{ 1-F ( x) }\), as well as \(\mathrm{ A}\)depending on the constants of slow variation but \(\mathrm{ m_{n} }\) in the third method depends on the behaviour of \(\mathrm{ Q( v ) }\)through \(\mathrm{ g ( t ) }\).
If we consider \(\mathrm{ F ( x ) = \Phi _{ \alpha } ( x ) }\) we see \(\mathrm{ m_{n} \sim B~n^{2/3} }\) and \(\mathrm{ n ( 1-\phi_{ \alpha } ( v_{n} ) ) \sim A~n^{2/3},m_{n} }\) being optimal for the second estimator in the third approach, and the methods are of the same order.
We can also devise a statistical choice procedure to choose between \(\mathrm{ \theta <0, \theta =0 }\) and \(\mathrm{ ~ \theta >0~ }\), analogous to the one developed by Tiago de Oliveira (1981); see Tiago de Oliveira (1991b) for details.
Davison, A. C. and Smith, R. L., 1990. Models for exceedance over high thresholds, (with discussion). J. Roy. Statist. Soc., 8 (52), 393-442.
de Haan, L., 1970. On regular variation and its applications to weak convergence of sample extremes. Math. Centre Tracts, 32, Amsterdam.
Mejzler, D., 1949. On a theorem of B. V. Gnedenko. Sb. Trudov Inst. Math. Akad. Nauk. SSR, 12, 31-35 (in russian).
Pickands, J. III, 1989. Consistent estimation of extreme quantiles, Preprint, University Pennsylvania.
Reiss, R. - D., 1989. Approximate Distribution of Order Statistics, Springer-Verlag.
Smith, R. L., 1989. Approximations in extreme value theory. Preprint, Centre for Stochastic Processes, University of North Carolina, at Chappel Hill.
Tiago de Oliveira, J., 1981. Statistical choice of univariate extreme models. Statistical Distributions in Scientific work, C. Taillie et al. eds., 6, 367-387, D. Reidel, Dordrecht.
Tiago de Oliveira, J., 1982. The -method for obtention of asymptotic distributions; applications. Publ. Inst. Statist. Univ. Paris, 27, 49-70.
Tiago de Oliveira, J., 1984b. Initiation of statistical decision for Weibull distribution. Cuad. Bioestad. Aplic. Informat., 2, 495-499.
Tiago de Oliveira, J., 1991. Perspectivas sobre a estatistica de extremos e problemas em aberto. Rev. Real Acad. Ciencias Exactas, Fisicas y Naturales, Madrid.
Tiago de Oliveira, J., 1991b. Tail estimation and extremes. Home Ahmed A. Sarhan, Alexandria, Egipt.