Weighted Doubly Robust (WDR)¶
Implementation: crl.estimators.wdr.WeightedDoublyRobustEstimator
Estimand¶
$V^\pi = \mathbb{E}\left[\sum_{t=0}^{T-1} \gamma^t r_t\right]$.
Assumptions¶
- Sequential ignorability
- Overlap/positivity
- Markov property
Inputs required¶
TrajectoryDatasetbehavior_action_probsfor logged actions- Q-model fit (linear by default in CRL)
Algorithm¶
WDR replaces trajectory weights with normalized weights per time step to reduce variance.
Formula¶
$\hat V = \sum_i \bar w_{i0} \hat V(s_{i0}) + \sum_t \sum_i \gamma^t \bar w_{it} \left(r_{it} + \gamma \hat V(s_{i,t+1}) - \hat Q(s_{it}, a_{it})\right)$.
Diagnostics¶
overlap.support_violationsess.ess_ratiomodel.q_model_mse
Uncertainty¶
- Normal-approximation CI by default.
- Bootstrap CI available via
bootstrap=True.
Failure modes¶
- Normalization can introduce bias in small samples.
- Sensitive to model misspecification when overlap is weak.
Minimal example¶
from crl.estimators.wdr import WeightedDoublyRobustEstimator
report = WeightedDoublyRobustEstimator(estimand).estimate(dataset)
References¶
- Thomas & Brunskill (2016)