Skip to content

Diagnostics

Diagnostics utilities.

action_overlap_slices(actions, behavior_action_probs, target_action_probs, *, action_space_n, top_k=5)

Summarize overlap metrics per action.

behavior_calibration_from_metadata(metadata)

Return calibration diagnostics stored in dataset metadata, if present.

compute_overlap_metrics(target_action_probs, behavior_action_probs, mask=None, threshold=0.001)

Compute overlap diagnostics between target and behavior policies.

Estimand

Not applicable.

Assumptions: Logged propensities are accurate and non-zero for observed actions. Inputs: target_action_probs: Array of probabilities for observed actions. behavior_action_probs: Array of behavior propensities. mask: Optional boolean mask for valid steps. threshold: Minimum acceptable behavior probability. Outputs: Dictionary of overlap metrics. Failure modes: If arrays contain zeros, ratios may be infinite.

effective_sample_size(weights)

Compute effective sample size from non-negative weights.

Estimand

Not applicable.

Assumptions: Weights are non-negative. Inputs: weights: Array of importance weights. Outputs: Effective sample size scalar. Failure modes: Returns 0 if weights are empty or sum to zero.

ess_ratio(weights)

Compute ESS divided by sample count.

Estimand

Not applicable.

Assumptions: Weights are non-negative. Inputs: weights: Array of importance weights. Outputs: ESS ratio scalar. Failure modes: Returns 0 if weights are empty or sum to zero.

state_shift_diagnostics(states, weights=None, *, max_samples=1000, seed=0)

Estimate state distribution shift using weighted vs. unweighted samples.

timestep_weight_slices(ratios, mask, *, top_k=5)

Summarize importance ratios by timestep.

weight_tail_stats(weights, quantile=0.99, threshold=10.0)

Compute weight tail statistics.

Estimand

Not applicable.

Assumptions: Weights are non-negative. Inputs: weights: Array of importance weights. quantile: Quantile level for tail summary. threshold: Threshold to count extreme weights. Outputs: Dictionary with tail metrics. Failure modes: None (returns zeros for empty input).

weight_time_diagnostics(weights, mask=None)

Summarize weight behavior over time (per timestep).

Plotting

Plotting utilities for diagnostics.

plot_ratio_histogram(ratios, bins=50, *, xlabel='$\\hat{\\nu}$', ylabel='Count', title=None, column='double', aspect=0.55, clip_quantile=None, log_y=False, ax=None)

Journal-ready histogram for target/behavior ratios.

plot_weight_histogram(weights, bins=50, *, xlabel='$\\hat{w}$', ylabel='Count', title=None, column='double', aspect=0.55, clip_quantile=None, log_y=False, ax=None)

Journal-ready histogram for importance weights. Returns the Matplotlib figure (so callers can save/export consistently).