EventStream.data.preprocessing.stddev_cutoff module¶
Pre-processor that filters data to contain only values within a certain number of standard deviations from the mean.
-
class EventStream.data.preprocessing.stddev_cutoff.StddevCutoffOutlierDetector(stddev_cutoff: float =
5.0)[source]¶ Bases:
PreprocessorFilters out data elements that are outside a specifiable number of standard deviations of the mean.
This is a concrete implementation of the Preprocessor abstract class. It is a pre-processor that identifies outliers, here defined to be data points more than a specifiable number of standard deviations away from the mean. It is implemented as a Polars friendly pre-processor, meaning that it is implemented as a Polars expression that can be used in both a select and a groupby aggregation context.
- stddev_cutoff¶
The number of standard deviations from the mean to use as the cutoff for identifying outliers. Defaults to 5.0.
Examples
>>> import polars as pl >>> S = StddevCutoffOutlierDetector(stddev_cutoff=1.0) >>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5]}) >>> params = S.fit_from_polars(pl.col("a")).alias("params") >>> df.select(params)["params"].to_list() [{'thresh_large_': 4.58113883008419, 'thresh_small_': 1.4188611699158102}] >>> outliers = S.predict_from_polars(pl.col("a"), params).alias("a_outliers") >>> df.select(outliers)["a_outliers"].to_list() [True, False, False, False, True]- fit_from_polars(column: Expr) Expr[source]¶
Identify the configured large and small extreme value thresholds from the data in
column.
- classmethod params_schema() dict[str, DataType][source]¶
Returns {“thresh_large_”: pl.Float64, “thresh_small_”: pl.Float64}.
- classmethod predict_from_polars(column: Expr, model_column: Expr) Expr[source]¶
Returns a column containing True if and only if the data in
columnis an outlier.