Skip to content

High-Confidence Off-Policy Evaluation (HCOPE)

Implementation: crl.estimators.high_confidence.HighConfidenceISEstimator

Assumptions

  • Sequential ignorability
  • Overlap/positivity
  • Bounded rewards

Requires

  • behavior_action_probs for logged actions

Diagnostics to check

  • overlap.support_violations
  • ess.ess_ratio
  • weights.tail_fraction

Formula (sketch)

Compute a clipped IS estimate and apply a concentration inequality (empirical Bernstein or Hoeffding) with bias correction, selecting the clipping parameter that maximizes the lower bound.

Uncertainty

  • Returns a lower bound (not a symmetric CI).
  • Confidence level set by delta in HighConfidenceISConfig.

Failure modes

  • Requires a valid reward bound; inferred bounds are heuristic.
  • Bounds can be vacuous with weak overlap.

Minimal example

from crl.estimators.high_confidence import HighConfidenceISEstimator, HighConfidenceISConfig

config = HighConfidenceISConfig(delta=0.05, bound="empirical_bernstein")
report = HighConfidenceISEstimator(estimand, config=config).estimate(dataset)

References

  • Thomas et al. (2015)