pyyeti.stats.order_stats¶

pyyeti.stats.order_stats(which, *, p=None, c=None, n=None, r=None)[source]¶

Compute a parameter from order statistics.

Parameters:

which (str) – Either ‘p’, ‘c’, ‘n’, or ‘r’ to specify which of the following arguments is to be computed from the others.
p (scalar or array_like; real, (0, 1)) – Proportion of population
c (scalar or array_like; real, (0, 1)) – Probability of bounding proportion p of the population (confidence level).
n (scalar or array_like; integer) – Sample size
r (scalar or ndarray; integer) –

Largest-value order statistic. Note:
```
number of failures = r - 1
```
Note

Zero will be returned for r if there are not enough samples to meet the criteria. For example, it takes at least 230 samples to reach a p=0.99, c=0.90 (P99/90) level. Therefore, order_stats("r", p=0.99, c=0.90, n=25) will return 0.

Returns:

One of p, c, n, or r; according to which.

Notes

One of the inputs of p, c, n, and r can be left as None; the remaining inputs must be broadcast-compatible and must be named.

The binomial distribution forms the mathematical foundation of this routine; see reference [1]. [2] has a good definition of the order statistic. See also “Bernoulli Trials”, reference [3], which ties some of these ideas together in the analysis of success/failure probabilities.

References

Examples

Start with 700 samples of unknown distribution. After sorting, which of the samples represents at least a P99/90 level? From published tables, r should be 4, meaning the 4-th highest value of the 700 is an estimate of the P99/90 level (or higher). Another way to look at that result is that 3 failures (or fewer) out of 700 trials demonstrates at least a P99/90 level.

>>> from pyyeti.stats import order_stats
>>> order_stats('r', p=.99, c=.90, n=700)
4

Holding the probability constant at 90%, the portion of the population bounded has to be at least 99%. But, what did it turn out to be?

>>> order_stats('p', c=.90, n=700, r=4)
0.99048109...

Instead, hold the portion constant. What is the probability of covering 99% percent of the population by selecting the 4th highest of 700?

>>> order_stats('c', p=.99, n=700, r=4)
0.91927834...

How many samples did we really need to reach at least the P99/90 level by selecting the 4th highest?

>>> order_stats('n', p=.99, c=.90, r=4)
667

Generate a 90% confidence table showing the number of trials needed for: r will go from 1 to 12 (defining the rows), and population coverage will be: [.95, .9772, .99, .9973, .99865] (defining the columns). Display using a pandas DataFrame:

>>> from pandas import DataFrame
>>> r = np.arange(1, 13).reshape(-1, 1)
>>> p = [.95, .97725, .99, .9973, .99865]
>>> table = order_stats('n', c=.90, r=r, p=p)
>>> DataFrame(table, index=r.ravel(), columns=p)
95000  0.97725  0.99000  0.99730  0.99865
      45      101      230      852     1705
      77      170      388     1440     2880
     105      233      531     1970     3941
     132      292      667     2473     4947
     158      350      798     2959     5920
     184      406      926     3433     6868
     209      461     1051     3899     7800
     234      516     1175     4358     8717
     258      569     1297     4811     9624
    282      622     1418     5259    10521
    306      675     1538     5704    11410
    330      727     1658     6145    12293

pyyeti.stats.order_stats¶

Table of Contents

Previous topic

Next topic

This Page