Estimators¶
Estimator results are returned as EstimatorReport objects with a stable schema
and export utilities:
report.to_dict()includesschema_version,value,stderr,ci,uncertainty, and diagnostics.report.to_dataframe()produces a one-row pandas table.report.save_json(path)andreport.save_html(path)persist reports.
Estimators for off-policy evaluation.
BootstrapConfig
dataclass
¶
Configuration for bootstrap confidence intervals.
DRCrossFitConfig
dataclass
¶
Configuration for cross-fitting.
Estimand
Not applicable.
Assumptions: None. Inputs: num_folds: Number of cross-fitting folds. num_iterations: Bellman iteration count for linear Q. ridge: Ridge regularization strength. seed: RNG seed for fold splitting. Outputs: Configuration object. Failure modes: None.
DRLConfig
dataclass
¶
Configuration for Double Reinforcement Learning (DRL).
DRLEstimator
¶
Bases: OPEEstimator
Double Reinforcement Learning estimator for discrete MDPs.
Estimand
PolicyValueEstimand for the target policy.
Assumptions: Sequential ignorability, overlap, Markov property. Inputs: TrajectoryDataset with discrete state_space_n. Outputs: EstimatorReport with value and diagnostics. Failure modes: Requires adequate state-action coverage to estimate occupancy ratios.
DiagnosticsConfig
dataclass
¶
Configuration for diagnostics thresholds.
Estimand
Not applicable.
Assumptions: None. Inputs: min_behavior_prob: Minimum behavior probability threshold. max_weight: Optional clipping threshold for importance weights. ess_threshold: Minimum ESS ratio before warnings. weight_tail_quantile: Quantile for tail summary. weight_tail_threshold: Threshold to flag heavy tails. Outputs: Configuration object. Failure modes: None.
DoubleRLConfig
dataclass
¶
Configuration for Double RL cross-fitting.
DoubleRLEstimator
¶
DoublyRobustEstimator
¶
Bases: OPEEstimator
Doubly robust estimator with cross-fitting.
Estimand
PolicyValueEstimand for the target policy.
Assumptions: Sequential ignorability, overlap, Markov property, and known behavior propensities. Inputs: TrajectoryDataset (n, t). Outputs: EstimatorReport with value and diagnostics. Failure modes: Bias if both the Q model and propensities are misspecified.
estimate(data)
¶
Estimate policy value via cross-fitted DR.
DualDICEConfig
dataclass
¶
Configuration for DualDICE.
DualDICEEstimator
¶
EstimatorReport
dataclass
¶
Report returned by estimators.
Estimand
Policy value for the estimator's target policy.
Assumptions: Assumptions are recorded in the estimand and warnings highlight issues. Outputs: value: Estimated policy value. stderr: Estimated standard error, if available. ci: Optional confidence interval (low, high). diagnostics: Dictionary of diagnostic metrics. assumptions_checked: Assumptions required by the estimator. assumptions_flagged: Assumptions flagged by diagnostics. warnings: List of warning strings. metadata: Extra metadata (fit details, configs). Failure modes: Diagnostics may be None if disabled.
save_html(path)
¶
Write report contents to an HTML file.
save_json(path)
¶
Write report contents to a JSON file.
to_dataframe()
¶
Return a one-row pandas DataFrame if pandas is available.
to_dict()
¶
Return a pandas-friendly dict representation.
to_html()
¶
Return a self-contained HTML report representation.
to_json()
¶
Return a JSON string representation.
FQEConfig
dataclass
¶
Configuration for FQE training.
Estimand
Not applicable.
Assumptions: None. Inputs: hidden_sizes: Hidden layer sizes for the Q network. learning_rate: Optimizer learning rate. batch_size: Mini-batch size. num_epochs: Epochs per iteration. num_iterations: Number of fitted Q iterations. weight_decay: L2 penalty. seed: RNG seed for torch and numpy. Outputs: Configuration object. Failure modes: None.
FQEEstimator
¶
Bases: OPEEstimator
Fitted Q Evaluation estimator for finite-horizon MDPs.
Estimand
PolicyValueEstimand for the target policy.
Assumptions: Sequential ignorability, overlap, Markov property, and Q-model realizability. Inputs: TrajectoryDataset (n, t). Outputs: EstimatorReport with value and diagnostics. Failure modes: Extrapolation error for out-of-distribution actions.
estimate(data)
¶
Estimate policy value via FQE.
GenDICEConfig
dataclass
¶
Configuration for GenDICE.
GenDICEEstimator
¶
HighConfidenceISConfig
dataclass
¶
Configuration for high-confidence lower bounds.
HighConfidenceISEstimator
¶
ISEstimator
¶
Bases: OPEEstimator
Trajectory-level importance sampling estimator.
Estimand
PolicyValueEstimand for the target policy.
Assumptions: Sequential ignorability, overlap/positivity, and known behavior propensities. Inputs: LoggedBanditDataset (n,) or TrajectoryDataset (n, t). Outputs: EstimatorReport with value and diagnostics. Failure modes: High variance under weak overlap.
estimate(data)
¶
Estimate policy value via IS.
MAGICConfig
dataclass
¶
Configuration for MAGIC.
MAGICEstimator
¶
MRDRConfig
dataclass
¶
Configuration for MRDR.
MRDREstimator
¶
MarginalizedImportanceSamplingEstimator
¶
OPEEstimator
¶
Bases: ABC
Base class for off-policy evaluation estimators.
Estimand
PolicyValueEstimand.
Assumptions: Each estimator declares required assumptions. Inputs: Dataset-specific objects such as TrajectoryDataset or LoggedBanditDataset. Outputs: EstimatorReport with value, diagnostics, and metadata. Failure modes: Raises ValueError if required assumptions are missing.
estimate(data)
abstractmethod
¶
Estimate policy value from data.
PDISEstimator
¶
Bases: OPEEstimator
Per-decision importance sampling estimator.
Estimand
PolicyValueEstimand for the target policy.
Assumptions: Sequential ignorability, overlap/positivity, and known behavior propensities. Inputs: TrajectoryDataset (n, t). Outputs: EstimatorReport with value and diagnostics. Failure modes: Variance grows with horizon under weak overlap.
estimate(data)
¶
Estimate policy value via PDIS.
UncertaintySummary
dataclass
¶
Structured summary of estimator uncertainty.
WDRConfig
dataclass
¶
Configuration for weighted doubly robust estimation.
WISEstimator
¶
Bases: OPEEstimator
Weighted importance sampling estimator.
Estimand
PolicyValueEstimand for the target policy.
Assumptions: Sequential ignorability, overlap/positivity, and known behavior propensities. Inputs: LoggedBanditDataset (n,) or TrajectoryDataset (n, t). Outputs: EstimatorReport with value and diagnostics. Failure modes: Bias from normalization in small samples.
estimate(data)
¶
Estimate policy value via WIS.