pyyeti.stats.order_stats

pyyeti.stats.order_stats(which, *, p=None, c=None, n=None, r=None)[source]

Compute a parameter from order statistics.

Parameters:
  • which (str) – Either ‘p’, ‘c’, ‘n’, or ‘r’ to specify which of the following arguments is to be computed from the others.

  • p (scalar or array_like; real, (0, 1)) – Proportion of population

  • c (scalar or array_like; real, (0, 1)) – Probability of bounding proportion p of the population (confidence level).

  • n (scalar or array_like; integer) – Sample size

  • r (scalar or ndarray; integer) –

    Largest-value order statistic. Note:

    number of failures = r - 1
    

    Note

    Zero will be returned for r if there are not enough samples to meet the criteria. For example, it takes at least 230 samples to reach a p=0.99, c=0.90 (P99/90) level. Therefore, order_stats("r", p=0.99, c=0.90, n=25) will return 0.

Returns:

One of p, c, n, or r; according to which.

Notes

One of the inputs of p, c, n, and r can be left as None; the remaining inputs must be broadcast-compatible and must be named.

The binomial distribution forms the mathematical foundation of this routine; see reference [1]. [2] has a good definition of the order statistic. See also “Bernoulli Trials”, reference [3], which ties some of these ideas together in the analysis of success/failure probabilities.

References

Examples

Start with 700 samples of unknown distribution. After sorting, which of the samples represents at least a P99/90 level? From published tables, r should be 4, meaning the 4-th highest value of the 700 is an estimate of the P99/90 level (or higher). Another way to look at that result is that 3 failures (or fewer) out of 700 trials demonstrates at least a P99/90 level.

>>> from pyyeti.stats import order_stats
>>> order_stats('r', p=.99, c=.90, n=700)
4

Holding the probability constant at 90%, the portion of the population bounded has to be at least 99%. But, what did it turn out to be?

>>> order_stats('p', c=.90, n=700, r=4)
0.99048109...

Instead, hold the portion constant. What is the probability of covering 99% percent of the population by selecting the 4th highest of 700?

>>> order_stats('c', p=.99, n=700, r=4)
0.91927834...

How many samples did we really need to reach at least the P99/90 level by selecting the 4th highest?

>>> order_stats('n', p=.99, c=.90, r=4)
667

Generate a 90% confidence table showing the number of trials needed for: r will go from 1 to 12 (defining the rows), and population coverage will be: [.95, .9772, .99, .9973, .99865] (defining the columns). Display using a pandas DataFrame:

>>> from pandas import DataFrame
>>> r = np.arange(1, 13).reshape(-1, 1)
>>> p = [.95, .97725, .99, .9973, .99865]
>>> table = order_stats('n', c=.90, r=r, p=p)
>>> DataFrame(table, index=r.ravel(), columns=p)
    0.95000  0.97725  0.99000  0.99730  0.99865
1        45      101      230      852     1705
2        77      170      388     1440     2880
3       105      233      531     1970     3941
4       132      292      667     2473     4947
5       158      350      798     2959     5920
6       184      406      926     3433     6868
7       209      461     1051     3899     7800
8       234      516     1175     4358     8717
9       258      569     1297     4811     9624
10      282      622     1418     5259    10521
11      306      675     1538     5704    11410
12      330      727     1658     6145    12293