EventStream.data.preprocessing.preprocessor module¶
The base class for Polars friendly data pre-processors.
This file contains the abstract base class for polars pre-processors. It is just used to define the interface expected by the data preprocessing pipeline. Subclasses (defined in other files in this module) contain actual implementations of algorithms.
- class EventStream.data.preprocessing.preprocessor.Preprocessor[source]¶
Bases:
ABCThe base class for Polars friendly data pre-processors.
This should be sub-classed by implementation classes for concrete implementations. Must define the schema of the output column produced by the pre-processor, the fit method which extracts those parameters from the raw data via a Polars expression, and the predict method which applies the pre-processing to a data column expression using another column containing the model parameters for that data element.
- abstract fit_from_polars(column: Expr) Expr[source]¶
Fit the pre-processing model over the data contained in
column.Performs the logic necessary to fit the pre-processing model over the data in the input column. As the input column is a polars expression, it does not contain materialized data, but rather just references a column operation that could be run to produce materialized data. The pre-processing logic must be consistent with that assumption. Must be implemented by a sub-class. The logic used in this method must be applicable for use in both a select and a groupby aggregation context.
- abstract classmethod params_schema() dict[str, DataType][source]¶
The schema of the output column produced by the pre-processor.
Must be implemented by a sub-class.
- abstract classmethod predict_from_polars(column: Expr, model_column: Expr) Expr[source]¶
Predicts for the data in
columngiven the fit parameters inmodel_column.Performs the logic necessary to “predict” as defined by the implementing subclass over the data in the input column according to the parameters in the fit model column. As both input columns are polars expressions, they do not contain materialized data, but rather just references column operations that could be run to produce materialized data. The pre-processing logic must be consistent with that assumption. Must be implemented by a sub-class. The logic used in this method must be applicable for use in both a select and a groupby aggregation context.