EventStream.data.visualize module

class EventStream.data.visualize.Visualizer(subset_size: int | None = None, subset_random_seed: int | None = None, static_covariates: list[str] = <factory>, plot_by_time: bool = True, time_unit: str | None = '1y', plot_by_age: bool = False, age_col: str | None = None, dob_col: str | None = None, n_age_buckets: int | None = 200)[source]

Bases: JSONableMixin

A visualization configuration and plotting class.

This class helps visualize Dataset objects. It is both a configuration object and performs the actual data manipulations for final visualization, interfacing only with the Dataset object to obtain appropriately sampled and processed cuts of the data to visualize. It currently produces the following plots. All plots are broken down by static_covariates, which are covariates that are constant for each subject.

## Analyzing the data over time (only produced if plot_by_time is True) Given an $x$-axis of time $t$, the following plots are produced:

  • “Active Subjects”: $y$ = the number of active subjects at time $x$ (i.e. the number of subjects who have at least one event before $t$ and have not yet had their last event at $t$).

  • “Cumulative Subjects”: $y$ = the number of cumulative subjects at time $t$ (i.e., the number of subjects who have at least one event before $t$).

  • “Cumulative Events”: $y$ = the number of events the dataset would obtain were it to be terminated at time $t$.

  • “Events / Subject”: $y$ = the average number of events per subject as would be observed were the dataset to be terminated at time $t$.

  • “Events / (Subject, Time)”: $y$ = the average rate of events per unit time per subject at time $t$

  • “Age Distribution over Time”: A 2D Density Heatmap plot showing the distributions of the ages of active subjects in the dataset at time $t$. Only produced if age_col is specified. Age is binned into n_age_buckets buckets.

## Analyzing the data over age (only produced if plot_by_age is True) Given an $x$-axis of age bucket $a$, the following plots are produced:

  • “Cumulative Subjects”: $y$ = the number of subjects in the dataset who have an event in the age bucket $a$.

  • “Cumulative Events”: $y$ = the number of events included in the dataset that occur at an age up to or before $a$.

  • “Events / Subject”: $y$ = the average number of events per subject that occur when the subject is at age bucket $a$.

subset_size

When plotting, use an IID random subsample (over subjects) of the input dataset of this size. This makes plotting much faster, and is statistically unbiased, though can increase variance.

Type:

int | None

subset_random_seed

If subsampling the raw data, use this random seed to control that subsampling.

Type:

int | None

static_covariates

When plotting, split plots by these static covariates.

Type:

list[str]

plot_by_time

If True, also plot how the dataset changes over time.

Type:

bool

time_unit

If plot_by_time is True, aggregate timepoints into buckets of this size.

Type:

str | None

plot_by_age

If True, plot how datasret characteristics evolve with subject age.

Type:

bool

age_col

The column in the Dataset’s events_df where age is stored. This should typically be the name of the measurement employing the AgeFunctor time dependent functor object, unless age is pre-computed in the dataset.

Type:

str | None

dob_col

This is used to compute ages of subjects at inferred timepoints created dynamically during plotting. This string should point to the date of birth (in datetime format) within the subjects dataframe.

Type:

str | None

n_age_buckets

If plot_by_age is True, this controls how many buckets ages are discretized into to limit plot granularity.

Type:

int | None

Raises:

ValueError – If * subset_size is specified but subset_random_seed is not. * plot_by_age is True, but age_col or n_age_buckets is None * age_col is specified but dob_col is not * plot_by_time is True, but time_unit is None

Examples

>>> V = Visualizer()
>>> V = Visualizer(
...     subset_size=100, subset_random_seed=1,
...     plot_by_age=True, age_col='age', dob_col='dob', n_age_buckets=100,
...     plot_by_time=True, time_unit='1y',
... )
>>> V = Visualizer(subset_size=100)
Traceback (most recent call last):
    ...
ValueError: subset_size is specified, but subset_random_seed is not!
>>> V = Visualizer(plot_by_age=True, age_col='age', n_age_buckets=None)
Traceback (most recent call last):
    ...
ValueError: plot_by_age is True, but n_age_buckets is unspecified!
>>> V = Visualizer(age_col='age')
Traceback (most recent call last):
    ...
ValueError: age_col is specified, but dob_col is not!
>>> V = Visualizer(plot_by_time=True, time_unit=None)
Traceback (most recent call last):
    ...
ValueError: plot_by_time is True, but time_unit is unspecified!
age_col : str | None = None
dob_col : str | None = None
n_age_buckets : int | None = 200
plot(subjects_df: DataFrame, events_df: DataFrame, dynamic_measurements_df: DataFrame) list[Figure][source]
plot_by_age : bool = False
plot_by_time : bool = True
plot_counts_over_age(events_df: DataFrame) list[Figure][source]
plot_counts_over_time(in_events_df: DataFrame) list[Figure][source]
plot_events_per_patient(events_df: DataFrame) list[Figure][source]
plot_static_variables_breakdown(static_variables: DataFrame) list[Figure][source]
static_covariates : list[str]
subset_random_seed : int | None = None
subset_size : int | None = None
time_unit : str | None = '1y'