Quickstart: Bandit OPE¶
This is the smallest end-to-end loop: synthesize data, define an estimand, run estimators, and read diagnostics.
1) Generate data¶
from crl.benchmarks.bandit_synth import SyntheticBandit, SyntheticBanditConfig
benchmark = SyntheticBandit(SyntheticBanditConfig(seed=0))
dataset = benchmark.sample(num_samples=1000, seed=1)
Behavior propensities
Synthetic benchmarks include behavior_action_probs. If your real data
does not, see Behavior Propensities Missing.
2) Define the estimand¶
from crl.assumptions import AssumptionSet
from crl.assumptions_catalog import BEHAVIOR_POLICY_KNOWN, OVERLAP, SEQUENTIAL_IGNORABILITY
from crl.estimands.policy_value import PolicyValueEstimand
estimand = PolicyValueEstimand(
policy=benchmark.target_policy,
discount=1.0,
horizon=1,
assumptions=AssumptionSet([SEQUENTIAL_IGNORABILITY, OVERLAP, BEHAVIOR_POLICY_KNOWN]),
)
3) Run estimators and read diagnostics¶
from crl.estimators.importance_sampling import ISEstimator, WISEstimator
for estimator in [ISEstimator(estimand), WISEstimator(estimand)]:
report = estimator.estimate(dataset)
print(report.value, report.diagnostics)
Key questions to ask:
- Do overlap diagnostics show support violations?
- Is ESS low or weight tails heavy?
- Do estimates move drastically across IS and WIS?
4) Compare to ground truth (for synthetic data)¶
Next steps¶
- Read the dataset contract: Dataset Format and Validation
- Run the full quickstart script:
python -m examples.quickstart.bandit_ope - Learn how to export results: Export and Reporting