EventStream.utils module¶
Utility functions for the EventStream library.
- class EventStream.utils.JSONableMixin[source]¶
Bases:
objectA simple mixin to enable saving/loading of data container classes to json files.
This mixin allows easy conversion between python objects and JSON format, facilitating their storage and retrieval. Subclasses must implement a
to_dictmethod that defines how the object should be converted to a dictionary.- classmethod from_dict(as_dict: dict) JSONABLE_INSTANCE_T[source]¶
Converts a dictionary representation of an object into the calling class.
By default, this method simply calls the calling class constructor with the arguments in
as_dictas keyword arguments. Can be overwritten by subclasses for more complex use cases.Examples
>>> class MyData(JSONableMixin): ... def __init__(self, name): ... self.name = name >>> my_data = MyData.from_dict({'name': 'Test'}) >>> my_data.name 'Test'
- classmethod from_json_file(fp: Path) JSONABLE_INSTANCE_T[source]¶
Loads the object from a json file.
Reads a JSON file and converts it into an object of the calling class.
- Parameters:¶
- Returns:¶
An instance of the calling class.
- Raises:¶
FileNotFoundError – If the passed file path does not exist.
Examples
>>> import dataclasses >>> import tempfile >>> from pathlib import Path >>> @dataclasses.dataclass ... class MyData(JSONableMixin): ... name: str >>> with tempfile.TemporaryDirectory() as tmp_dir: ... fp = Path(tmp_dir) / 'test.json' ... with open(fp, mode='w') as f: ... _ = f.write('{"name": "Test"}') ... data = MyData.from_json_file(fp) >>> data.to_dict() {'name': 'Test'} >>> with tempfile.TemporaryDirectory() as tmp_dir: ... fp = Path(tmp_dir) / 'test.json' ... MyData.from_json_file(fp) Traceback (most recent call last): ... FileNotFoundError: ...test.json...
- to_dict() dict[str, Any][source]¶
Converts the object into a dictionary.
If the calling object is a
dataclasses.dataclass, then this method just callsdataclasses.asdict. Otherwise, this method needs to be implemented by the subclasses.- Returns:¶
A dictionary representation of the object.
- Raises:¶
NotImplementedError – If this method is not implemented by the subclass.
Examples
>>> @dataclasses.dataclass ... class MyData(JSONableMixin): ... name: str >>> MyData('Test').to_dict() {'name': 'Test'} >>> class MyData(JSONableMixin): ... def __init__(self, name: str): ... self.name = name ... def to_dict(self) -> dict[str, str]: ... return {"name": self.name} >>> MyData('Test2').to_dict() {'name': 'Test2'} >>> class MyData(JSONableMixin): ... def __init__(self, name: str): ... self.name = name >>> MyData('Test2').to_dict() Traceback (most recent call last): ... NotImplementedError: This must be overwritten in non-dataclass derived classes!
-
to_json_file(fp: Path, do_overwrite: bool =
False)[source]¶ Writes the object to a json file.
Serializes the object as JSON and writes it to a file.
- Parameters:¶
- Raises:¶
FileExistsError – If the file already exists and do_overwrite is set to False.
Examples
>>> import dataclasses >>> import tempfile >>> from pathlib import Path >>> @dataclasses.dataclass ... class MyData(JSONableMixin): ... name: str >>> data = MyData('Test') >>> with tempfile.TemporaryDirectory() as tmp_dir: ... fp = Path(tmp_dir) / 'test.json' ... data.to_json_file(fp, do_overwrite=False) ... with open(fp, mode='r') as f: ... f.read() '{"name": "Test"}' >>> with tempfile.TemporaryDirectory() as tmp_dir: ... fp = Path(tmp_dir) / 'test.json' ... fp.touch() ... data.to_json_file(fp, do_overwrite=False) Traceback (most recent call last): ... FileExistsError: ...test.json exists and do_overwrite = False
- class EventStream.utils.StrEnum(value)[source]¶
-
An enum object where members are stored as lowercase strings and can be used as strings.
StrEnum is a Python
enum.Enumthat inherits fromstr. This allows it to be compared identically with string objects, making it suitable for use for configuration values parsed from command line or other string arguments. The defaultauto()behavior uses the member name, lowercased, as its value. This code is sourced from https://github.com/irgeek/StrEnum/blob/0f868b68cb7cdab50a79117679a301f550a324bc/strenum/__init__.py#L21 This is made obsolete by python 3.11, which hasenum.StrEnumnatively.- Raises:¶
TypeError if given enum variable values are not strings. –
Examples
>>> from enum import auto >>> class Example(StrEnum): ... UPPER_CASE = auto() ... lower_case = auto() ... MixedCaseFixed = "MixedCaseFixed" >>> assert Example.UPPER_CASE == "upper_case" >>> assert Example.lower_case == "lower_case" >>> assert Example.MixedCaseFixed == "MixedCaseFixed" >>> class Example(StrEnum): ... VAR_1 = 1 Traceback (most recent call last): ... TypeError: Values of StrEnums must be strings: 1 is a <class 'int'>- classmethod values()[source]¶
Returns a list of enum class member options as strings.
This method gives a list of possible options in the calling class; it is useful for validating a string is a member of the enum.
- Returns:¶
A list of enum members as strings.
Examples
>>> from enum import auto >>> class Example(StrEnum): ... UPPER_CASE = auto() ... lower_case = auto() ... MixedCase = auto() >>> Example.values() ['upper_case', 'lower_case', 'mixedcase'] >>> class Example(StrEnum): ... var1 = "VAR_1" ... Var2 = auto() >>> Example.values() ['VAR_1', 'var2']
- EventStream.utils.count_or_proportion(N: int | Expr | None, cnt_or_prop: int | float) int[source]¶
Returns
cnt_or_propif it is an integer orint(N*cnt_or_prop)if it is a float.Resolves cutoff variables that can either be passed as integer counts or fractions of a whole. E.g., the vocabulary should contain only elements that occur with count or proportion at least X, where X might be 20 times, or 1%.
- Parameters:¶
- Returns:¶
The cutoff value as an integer count of the whole.
- Raises:¶
TypeError – If
cnt_or_propis not an integer or a float or ifNis needed and is not an integer or a polars Expression.ValueError – If
cnt_or_propis not a positive integer or a float between 0 and 1.
Examples
>>> count_or_proportion(100, 0.1) 10 >>> count_or_proportion(None, 11) 11 >>> count_or_proportion(100, 0.116) 12 >>> count_or_proportion(None, 0) Traceback (most recent call last): ... ValueError: 0 must be positive if it is an integer >>> count_or_proportion(None, 1.3) Traceback (most recent call last): ... ValueError: 1.3 must be between 0 and 1 if it is a float >>> count_or_proportion(None, "a") Traceback (most recent call last): ... TypeError: a must be a positive integer or a float between 0 or 1 >>> count_or_proportion("a", 0.2) Traceback (most recent call last): ... TypeError: a must be an integer or a polars.Expr when cnt_or_prop is a float!
- EventStream.utils.hydra_dataclass(dataclass: Any) Any[source]¶
Decorator that allows you to use a dataclass as a hydra config via the
ConfigStoreAdds the decorated dataclass as a Hydra StructuredConfig object to the Hydra ConfigStore. The name of the stored config in the ConfigStore is the snake case version of the CamelCase class name.
-
EventStream.utils.lt_count_or_proportion(N_obs: int, cnt_or_prop: int | float | None, N_total: int | None =
None) bool[source]¶ Returns True if
N_obsis less than thecnt_or_propofN_total.- Parameters:¶
- Returns:¶
If
cnt_or_propisNone, returnFalse. Otherwise, returnTrueifN_obsis less than thecnt_or_prop(if it is a count) orint(round(cnt_or_prop*N_total))if it is a proportion.
Examples
>>> lt_count_or_proportion(10, 0.1, 100) False >>> lt_count_or_proportion(10, 0.11, 100) True >>> lt_count_or_proportion(10, 11) True >>> lt_count_or_proportion(10, 9) False >>> lt_count_or_proportion(10, None) False
- EventStream.utils.num_initial_spaces(s: str) int[source]¶
Returns the number of initial spaces in
s.Examples
>>> num_initial_spaces(" a") 2 >>> num_initial_spaces("lorem ipsum ") 0
- EventStream.utils.task_wrapper(task_func: Callable) Callable[source]¶
Optional decorator that controls the failure behavior when executing the task function.
It ensures that weights and biases finish tracking any runs that were running, even in the case of an exception, to avoid multi-run failures due to weights and biases errors.