pyyeti.stats.order_stats¶
- pyyeti.stats.order_stats(which, *, p=None, c=None, n=None, r=None)[source]¶
Compute a parameter from order statistics.
- Parameters:
which (str) – Either ‘p’, ‘c’, ‘n’, or ‘r’ to specify which of the following arguments is to be computed from the others.
p (scalar or array_like; real, (0, 1)) – Proportion of population
c (scalar or array_like; real, (0, 1)) – Probability of bounding proportion p of the population (confidence level).
n (scalar or array_like; integer) – Sample size
r (scalar or ndarray; integer) –
Largest-value order statistic. Note:
number of failures = r - 1
Note
Zero will be returned for r if there are not enough samples to meet the criteria. For example, it takes at least 230 samples to reach a
p=0.99, c=0.90(P99/90) level. Therefore,order_stats("r", p=0.99, c=0.90, n=25)will return 0.
- Returns:
One of p, c, n, or r; according to which.
Notes
One of the inputs of p, c, n, and r can be left as None; the remaining inputs must be broadcast-compatible and must be named.
The binomial distribution forms the mathematical foundation of this routine; see reference [1]. [2] has a good definition of the order statistic. See also “Bernoulli Trials”, reference [3], which ties some of these ideas together in the analysis of success/failure probabilities.
References
Examples
Start with 700 samples of unknown distribution. After sorting, which of the samples represents at least a P99/90 level? From published tables, r should be 4, meaning the 4-th highest value of the 700 is an estimate of the P99/90 level (or higher). Another way to look at that result is that 3 failures (or fewer) out of 700 trials demonstrates at least a P99/90 level.
>>> from pyyeti.stats import order_stats >>> order_stats('r', p=.99, c=.90, n=700) 4
Holding the probability constant at 90%, the portion of the population bounded has to be at least 99%. But, what did it turn out to be?
>>> order_stats('p', c=.90, n=700, r=4) 0.99048109...
Instead, hold the portion constant. What is the probability of covering 99% percent of the population by selecting the 4th highest of 700?
>>> order_stats('c', p=.99, n=700, r=4) 0.91927834...
How many samples did we really need to reach at least the P99/90 level by selecting the 4th highest?
>>> order_stats('n', p=.99, c=.90, r=4) 667
Generate a 90% confidence table showing the number of trials needed for: r will go from 1 to 12 (defining the rows), and population coverage will be: [.95, .9772, .99, .9973, .99865] (defining the columns). Display using a pandas DataFrame:
>>> from pandas import DataFrame >>> r = np.arange(1, 13).reshape(-1, 1) >>> p = [.95, .97725, .99, .9973, .99865] >>> table = order_stats('n', c=.90, r=r, p=p) >>> DataFrame(table, index=r.ravel(), columns=p) 0.95000 0.97725 0.99000 0.99730 0.99865 1 45 101 230 852 1705 2 77 170 388 1440 2880 3 105 233 531 1970 3941 4 132 292 667 2473 4947 5 158 350 798 2959 5920 6 184 406 926 3433 6868 7 209 461 1051 3899 7800 8 234 516 1175 4358 8717 9 258 569 1297 4811 9624 10 282 622 1418 5259 10521 11 306 675 1538 5704 11410 12 330 727 1658 6145 12293