Diagnostics¶
Diagnostics utilities.
action_overlap_slices(actions, behavior_action_probs, target_action_probs, *, action_space_n, top_k=5)
¶
Summarize overlap metrics per action.
behavior_calibration_from_metadata(metadata)
¶
Return calibration diagnostics stored in dataset metadata, if present.
compute_overlap_metrics(target_action_probs, behavior_action_probs, mask=None, threshold=0.001)
¶
Compute overlap diagnostics between target and behavior policies.
Estimand
Not applicable.
Assumptions: Logged propensities are accurate and non-zero for observed actions. Inputs: target_action_probs: Array of probabilities for observed actions. behavior_action_probs: Array of behavior propensities. mask: Optional boolean mask for valid steps. threshold: Minimum acceptable behavior probability. Outputs: Dictionary of overlap metrics. Failure modes: If arrays contain zeros, ratios may be infinite.
effective_sample_size(weights)
¶
Compute effective sample size from non-negative weights.
Estimand
Not applicable.
Assumptions: Weights are non-negative. Inputs: weights: Array of importance weights. Outputs: Effective sample size scalar. Failure modes: Returns 0 if weights are empty or sum to zero.
ess_ratio(weights)
¶
Compute ESS divided by sample count.
Estimand
Not applicable.
Assumptions: Weights are non-negative. Inputs: weights: Array of importance weights. Outputs: ESS ratio scalar. Failure modes: Returns 0 if weights are empty or sum to zero.
state_shift_diagnostics(states, weights=None, *, max_samples=1000, seed=0)
¶
Estimate state distribution shift using weighted vs. unweighted samples.
timestep_weight_slices(ratios, mask, *, top_k=5)
¶
Summarize importance ratios by timestep.
weight_tail_stats(weights, quantile=0.99, threshold=10.0)
¶
Compute weight tail statistics.
Estimand
Not applicable.
Assumptions: Weights are non-negative. Inputs: weights: Array of importance weights. quantile: Quantile level for tail summary. threshold: Threshold to count extreme weights. Outputs: Dictionary with tail metrics. Failure modes: None (returns zeros for empty input).
weight_time_diagnostics(weights, mask=None)
¶
Summarize weight behavior over time (per timestep).
Plotting¶
Plotting utilities for diagnostics.
plot_ratio_histogram(ratios, bins=50, *, xlabel='$\\hat{\\nu}$', ylabel='Count', title=None, column='double', aspect=0.55, clip_quantile=None, log_y=False, ax=None)
¶
Journal-ready histogram for target/behavior ratios.
plot_weight_histogram(weights, bins=50, *, xlabel='$\\hat{w}$', ylabel='Count', title=None, column='double', aspect=0.55, clip_quantile=None, log_y=False, ax=None)
¶
Journal-ready histogram for importance weights. Returns the Matplotlib figure (so callers can save/export consistently).