High-Confidence Off-Policy Evaluation (HCOPE)¶
Implementation: crl.estimators.high_confidence.HighConfidenceISEstimator
Assumptions¶
- Sequential ignorability
- Overlap/positivity
- Bounded rewards
Requires¶
behavior_action_probsfor logged actions
Diagnostics to check¶
overlap.support_violationsess.ess_ratioweights.tail_fraction
Formula (sketch)¶
Compute a clipped IS estimate and apply a concentration inequality (empirical Bernstein or Hoeffding) with bias correction, selecting the clipping parameter that maximizes the lower bound.
Uncertainty¶
- Returns a lower bound (not a symmetric CI).
- Confidence level set by
deltainHighConfidenceISConfig.
Failure modes¶
- Requires a valid reward bound; inferred bounds are heuristic.
- Bounds can be vacuous with weak overlap.
Minimal example¶
from crl.estimators.high_confidence import HighConfidenceISEstimator, HighConfidenceISConfig
config = HighConfidenceISConfig(delta=0.05, bound="empirical_bernstein")
report = HighConfidenceISEstimator(estimand, config=config).estimate(dataset)
References¶
- Thomas et al. (2015)