Double Reinforcement Learning (Bandit)¶
Implementation: crl.estimators.double_rl.DoubleRLEstimator
Assumptions¶
- Sequential ignorability
- Overlap/positivity
Requires¶
LoggedBanditDatasetbehavior_action_probsoptional (estimated if contexts are discrete)
Diagnostics to check¶
overlap.support_violationsess.ess_ratio
Formula¶
Orthogonalized score using outcome and propensity models in a bandit setting.
Uncertainty¶
- Normal-approximation CI by default.
- Bootstrap CI available via
bootstrap=True.
Failure modes¶
- Misspecified nuisance models can increase variance.
Minimal example¶
from crl.estimators.double_rl import DoubleRLEstimator
report = DoubleRLEstimator(estimand).estimate(dataset)
References¶
- Kallus & Uehara (2020)