pyyeti.dsp.fixtime

pyyeti.dsp.fixtime(olddata, sr=None, *, negmethod='sort', deldrops=True, dropval=-1.4013e-45, delouttimes=True, delspikes=False, base=None, hold_previous_value=False, previous_value_tol=0.001, getall=False, verbose=True)[source]

Process recorded data to make an even time vector.

Parameters:
  • olddata (2d ndarray or 2-element tuple/list) – If ndarray, it must have 2 columns: [time, signal]. Otherwise, it must be a 2-element tuple or list, eg: (time, signal)

  • sr (scalar, string or None; optional) – If scalar, specifies the sample rate. If ‘auto’, the algorithm chooses a “best” fit. If None, user is prompted after displaying some statistics on sample rate.

  • negmethod (string; optional) – Specifies how to handle negative time steps:

    negmethod

    Action

    “stop”

    Error out

    “sort”

    Sort data

  • deldrops (bool; optional) – If True, dropouts are deleted from the data; otherwise, they are left in.

  • dropval (scalar; optional) – The numerical value of drop-outs. Note that any np.nan or np.inf values in the data are treated as drop-outs in any case.

  • delouttimes (bool; optional) – If True, outlier times are deleted from the data; otherwise, they are left in.

  • delspikes (bool or dict; optional) – If False, do not delete spikes. If True, delete spikes by calling despike_diff() with inputs as defined below. If a dict, you can take complete control. You can specify one of 3 methods for despiking:

    method

    Action

    “despike_diff”

    Call despike_diff() (default)

    “despike”

    Call despike()

    “simple”

    Detect outliers by standard deviations from a moving average through signal.

    For example, to set the method to “despike”, the number of standard deviations to 12, the window size to 25, the maximum iterations to 100, and the threshold_value to 0.25:

    delspikes=dict(method='despike', sigma=12, n=25,
                   maxiter=100, threshold_value=0.25)
    

    Defaults are defined for some parameters (others are accepted from the definition of despike_diff() or despike() … ‘simple’ only uses these three). The defaults are:

    method = 'despike_diff'
    n = 15
    sigma = 8
    maxiter = -1   # negative value means no limit
    
  • base (scalar or None; optional) – Scalar value that new time vector would hit exactly if within range. If None, new time vector is aligned to longest section of “good” data.

  • hold_previous_value (bool; optional) – If True, hold previous value instead of finding closest value (but see previous_value_tol). The default is False; find closest value and, in case of a tie, use previous value. For example:

    olddata = ([0.0, 4.0], [10.0, 20])
    t, y1 = fixtime(olddata, sr=1.0)
    t, y2 = fixtime(olddata, sr=1.0, hold_previous_value=True)
    

    Gives:

    t  --> array([  0.,   1.,   2.,   3.,   4.])
    y1 --> array([ 10.,  10.,  10.,  20.,  20.])
    y2 --> array([ 10.,  10.,  10.,  10.,  20.])
    
  • previous_value_tol (float; optional) – If hold_previous_value is True, a new time value is considered equal to an old time value if it is within previous_value_tol * dt of it, where dt is the new time step. Must be within [0.0, 1.0], inclusive.

  • getall (bool; optional) – If True, return fixinfo; otherwise only newdata is returned.

  • verbose (bool; optional) – If True, sample rate statistics are printed. Note that if sr is None, verbose is internally set to True.

Returns:

  • newdata (2d ndarray or tuple) – Cleaned up version of olddata. Will be 2d ndarray if olddata was ndarray; otherwise it is a tuple: (time, data).

  • fixinfo (SimpleNamespace; optional) – Only returned if getall is True. Members:

    • sr_stats1d ndarray

      Five-element vector with the sample rate statistics; useful to help user select best sample rate or to compare against sr. The five elements are:

      [max_sr, min_sr, ave_sr, max_count_sr, max_count_percent]
      

      The max_count_sr is the sample rate that occurred most often. This is usually the ‘correct’ sample rate. max_count_percent gives the percent occurrence of max_count_sr.

    • tp1d ndarray

      Contains indices into old time vector of where time-step shifts (“turning points”) were done to align the new time vector against the old.

    • alldropsSimpleNamespace or None

      Has 1d indexing arrays into olddata showing the drops:

      dropouts

      shows infs, nans, and dropvals (None if not deldrops)

      outtimes

      shows where outlier times were found in olddata (whether they were deleted or not)

      spikes

      shows where spikes were found in olddata (None if not delspikes)

      alldrops

      merger of dropouts and spikes plus possible points in between those

    • despike_infoSimpleNamespace or None

      If delspikes is True or a dict, despike_info contains:

      delspikes

      Dict of values used for spike removal (input to despike() for the “despike” methods)

      niter

      Number of iterations of spike removal

Notes

This algorithm works as follows:

  1. Find and delete drop-outs if deldrops is True.

  2. Delete outlier times if delouttimes is True. These are points with times that are more than 3 standard deviations away from the mean. A warning message is printed if any such times are found. Note that on a perfect time vector, the end points are at 1.73 sigma (eg: mean + 1.73*sigma = 0.5 + 1.73*0.2887 = 1.0).

  3. Check for positive time steps, and if there are none, error out.

  4. Check the time vector for negative steps. Sort or error out as specified by negmethod. Warnings are printed in any case.

  5. Compute and print sample rates for user review. Perhaps the most useful of these printed numbers is the one based on the count. numpy.histogram() is used to count which sample rate occurs most often (to the nearest multiple of 5 in most cases). If there is a high percentage printed with that sample rate, it is likely the correct value to use (at least within 5 samples/second). If sr is not input, prompt user for sr.

  6. Call selected despiker if requested to delete data spikes.

  7. Count number of small time-steps defined as those that are less than 0.93/sr. If more than 1% of the steps are small, print a warning.

  8. Count number of large time-steps defines as those that are greater than 1.07/sr. If more than 1% of the steps are large, print a warning.

  9. Make a new, evenly spaced time vector according to the new sample rate that spans the range of time in olddata.

  10. Find the “turning points” in the old time vector. These are where the step differs by more than 1/4 step from the ideal. Will issue warning if the number of turning points is greater than 50% of total points.

  11. If step 10 did not issue a warning about too many turning points, the new time vector is shifted to align with the longest section of “good” old time steps.

  12. Loop over the segments defined by the turning points. Each segment will shifted left or right to fit with the new time vector. The longest section is not shifted due to step 12 (unless that step was skipped because of too many turning points).

  13. If base is not None, the new time vector is shifted by up to a half time step such that it would hit base exactly (if it was in range).

  14. Fill in new data vector using best fit times. This means that gaps are filled with flat lines using the closest value if hold_previous_value is False, or the previous value if hold_previous_value is True. This routine does not do any linear interpolation.

If despiking is not producing good results:

  1. Spikes very near the ends of the signal (in the first or last window) can cause trouble for the despike_diff() and despike() routines. If exclude_point is ‘first’, spikes in the last window should be avoided (the routine works backward); conversely, if exclude_point is ‘last’, spikes in the first window should be avoided (the routine works forward).

  2. Try increasing/decreasing sigma to make it more/less picky.

  3. If bit toggles or similar small spikes are being considered spikes (which can also make the routine take a very long time to run), setting threshold_value to a suitable value for the current data is often a good solution. Increasing threshold_sigma can also protect these small spikes. Note that the threshold settings are not available for the “simple” delspikes method.

  4. Try a different window size.

  5. Try a different method. They all have strengths and weaknesses, so experiment.

Examples

>>> from pyyeti import dsp
>>> t = [0., 1., 5., 6.]
>>> y = [1., 2., 3., 4.]
>>> tn, yn = dsp.fixtime((t, y), sr=1)
==> Info: [min, max, ave, count (% occurrence)] time step:
==>           [1, 4, 2, 1 (66.7%)]
==>       Corresponding sample rates:
==>           [1, 0.25, 0.5, 1 (66.7%)]
==>       Note: "count" shows most frequent sample rate to
          nearest 0.2 samples/sec.
==> Using sample rate = 1
>>> tn
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])
>>> yn
array([ 1.,  2.,  2.,  2.,  3.,  3.,  4.])

Repeat, but with hold_previous_value set to True:

>>> tn, yn = dsp.fixtime(
...    (t, y), sr=1, hold_previous_value=True, verbose=False
... )
>>> tn
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])
>>> yn
array([ 1.,  2.,  2.,  2.,  2.,  3.,  4.])