EventStream.data.preprocessing.standard_scaler module

Pre-processor that normalizes data to have zero mean and unit variance.

class EventStream.data.preprocessing.standard_scaler.StandardScaler[source]

Bases: Preprocessor

Normalizes data to have zero mean and unit variance.

This is a concrete implementation of the Preprocessor abstract class. It is a pre-processor that normalizes data to have zero mean and unit variance. It is implemented as a Polars friendly pre-processor, meaning that it is implemented as a Polars expression that can be used in both a select and a groupby aggregation context.

Examples

>>> import polars as pl
>>> S = StandardScaler()
>>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5]})
>>> params = S.fit_from_polars(pl.col("a")).alias("params")
>>> df.select(params)["params"].to_list()
[{'mean_': 3.0, 'std_': 1.5811388300841898}]
>>> norm = S.predict_from_polars(pl.col("a"), params).alias("a_norm")
>>> df.select(norm)["a_norm"].to_list()
[-1.2649110640673518, -0.6324555320336759, 0.0, 0.6324555320336759, 1.2649110640673518]
fit_from_polars(column: Expr) Expr[source]

Fit the mean and standard deviation of the data in column.

Parameters:
column: Expr

The Polars expression for the column containing the raw data to be pre-processed.

Returns:

A polars expression for a struct column containing the mean and standard deviation of

the data in column in fields named “mean_” and “std_” respectively.

Return type:

pl.Expr

classmethod params_schema() dict[str, DataType][source]

Returns {“mean_”: pl.Float64, “std_”: pl.Float64}.

classmethod predict_from_polars(column: Expr, model_column: Expr) Expr[source]

Returns (column - model_column.struct.field("mean_")) / model_column.struct.field("std_").

Parameters:
column: Expr

The Polars expression for the column containing the raw data to be centered and scaled.

model_column: Expr

The Polars expression for a struct column containing “mean_” and “std_” fields.

Returns:

(column - model_column.struct.field("mean_")) / model_column.struct.field("std_")

Return type:

pl.Expr