EventStream.data.visualize module¶
- class EventStream.data.visualize.Visualizer(subset_size: int | None = None, subset_random_seed: int | None = None, static_covariates: list[str] = <factory>, plot_by_time: bool = True, time_unit: str | None = '1y', plot_by_age: bool = False, age_col: str | None = None, dob_col: str | None = None, n_age_buckets: int | None = 200, min_sub_to_plot_age_dist: int | None = 50)[source]¶
Bases:
JSONableMixinA visualization configuration and plotting class.
This class helps visualize
Datasetobjects. It is both a configuration object and performs the actual data manipulations for final visualization, interfacing only with theDatasetobject to obtain appropriately sampled and processed cuts of the data to visualize. It currently produces the following plots. All plots are broken down bystatic_covariates, which are covariates that are constant for each subject.## Analyzing the data over time (only produced if
plot_by_timeis True) Given an $x$-axis of time $t$, the following plots are produced:“Active Subjects”: $y$ = the number of active subjects at time $x$ (i.e. the number of subjects who have at least one event before $t$ and have not yet had their last event at $t$).
“Cumulative Subjects”: $y$ = the number of cumulative subjects at time $t$ (i.e., the number of subjects who have at least one event before $t$).
“Cumulative Events”: $y$ = the number of events the dataset would obtain were it to be terminated at time $t$.
“Events / Subject”: $y$ = the average number of events per subject as would be observed were the dataset to be terminated at time $t$.
“Events / (Subject, Time)”: $y$ = the average rate of events per unit time per subject at time $t$
“Age Distribution over Time”: A 2D Density Heatmap plot showing the distributions of the ages of active subjects in the dataset at time $t$. Only produced if
age_colis specified. Age is binned inton_age_bucketsbuckets.
## Analyzing the data over age (only produced if
plot_by_ageis True) Given an $x$-axis of age bucket $a$, the following plots are produced:“Cumulative Subjects”: $y$ = the number of subjects in the dataset who have an event in the age bucket $a$.
“Cumulative Events”: $y$ = the number of events included in the dataset that occur at an age up to or before $a$.
“Events / Subject”: $y$ = the average number of events per subject that occur when the subject is at age bucket $a$.
- subset_size¶
When plotting, use an IID random subsample (over subjects) of the input dataset of this size. This makes plotting much faster, and is statistically unbiased, though can increase variance.
- subset_random_seed¶
If subsampling the raw data, use this random seed to control that subsampling.
- time_unit¶
If
plot_by_timeisTrue, aggregate timepoints into buckets of this size.
- age_col¶
The column in the Dataset’s
events_dfwhere age is stored. This should typically be the name of the measurement employing theAgeFunctortime dependent functor object, unless age is pre-computed in the dataset.
- dob_col¶
This is used to compute ages of subjects at inferred timepoints created dynamically during plotting. This string should point to the date of birth (in datetime format) within the subjects dataframe.
- n_age_buckets¶
If
plot_by_ageisTrue, this controls how many buckets ages are discretized into to limit plot granularity.
- min_sub_to_plot_age_dist¶
If set, do not plot sub-population distributions over age if the total number of patients in the sub-population is below this value. Useful for limiting variance.
- Raises:¶
ValueError – If *
subset_sizeis specified butsubset_random_seedis not. *plot_by_ageisTrue, butage_colorn_age_bucketsisNone*age_colis specified butdob_colis not *plot_by_timeisTrue, buttime_unitis None
Examples
>>> V = Visualizer() >>> V = Visualizer( ... subset_size=100, subset_random_seed=1, ... plot_by_age=True, age_col='age', dob_col='dob', n_age_buckets=100, ... plot_by_time=True, time_unit='1y', ... ) >>> V = Visualizer(subset_size=100) Traceback (most recent call last): ... ValueError: subset_size is specified, but subset_random_seed is not! >>> V = Visualizer(plot_by_age=True, age_col='age', n_age_buckets=None) Traceback (most recent call last): ... ValueError: plot_by_age is True, but n_age_buckets is unspecified! >>> V = Visualizer(age_col='age') Traceback (most recent call last): ... ValueError: age_col is specified, but dob_col is not! >>> V = Visualizer(plot_by_time=True, time_unit=None) Traceback (most recent call last): ... ValueError: plot_by_time is True, but time_unit is unspecified!- plot(subjects_df: DataFrame, events_df: DataFrame, dynamic_measurements_df: DataFrame) list[Figure][source]¶