CausalRL¶
Estimand-first causal RL and off-policy evaluation¶
Know what you're estimating. Know when to trust it. Know how it was produced.
Package: causalrl · Import: crl · Version: 0.2.0 · GitHub
Why CausalRL?¶
-
Estimand-first Design
Every estimator is tied to a formal estimand with explicit identification assumptions. Know what you're estimating.
-
Diagnostics by Default
Overlap, ESS, weight tails, and shift checks run automatically. Know when to trust your estimates.
-
20+ Estimators
IS, WIS, DR, WDR, MAGIC, MRDR, MIS, FQE, DualDICE, GenDICE, DRL—all in a unified pipeline.
-
Sensitivity Analysis
Bounded-confounding curves for bandits and sequential settings. Quantify robustness to hidden confounders.
-
D4RL & RL Unplugged
Built-in adapters for standard RL benchmarks. Load datasets with one line of code.
-
Audit-Ready Reports
HTML reports with tables, figures, and full metadata bundles. Share reproducible results.
-
Ground-Truth Benchmarks
Synthetic bandit/MDP suites with known true values. Validate estimators before deployment.
-
Production Ready
Type-checked, tested, with deterministic seeding throughout. Built for research reliability.
60-Second Quickstart¶
from crl.benchmarks.bandit_synth import SyntheticBandit, SyntheticBanditConfig
from crl.ope import evaluate_ope
# Create benchmark with known ground truth
benchmark = SyntheticBandit(SyntheticBanditConfig(seed=0))
dataset = benchmark.sample(num_samples=1000, seed=1)
# Run end-to-end evaluation
report = evaluate_ope(dataset=dataset, policy=benchmark.target_policy)
# View results and generate report
print(report.summary_table())
report.save_html("report.html")
Scope
The current evaluate pipeline assumes discrete action spaces for importance sampling estimators. See Limitations for details on continuous actions.
The Three Pillars¶
| Pillar | Why It Matters | What You Get |
|---|---|---|
| Estimands | Know what quantity you're estimating—not just which estimator | Explicit estimands with identification assumptions via AssumptionSet |
| Diagnostics | Know when an estimate is fragile before acting on it | Overlap checks, ESS, weight tails, shift diagnostics, sensitivity curves |
| Evidence | Know how results were produced for auditing | Versioned configs, deterministic seeds, structured report bundles |
Results Gallery¶
Why Trust CausalRL?¶
-
Explicit Assumptions
Every estimand declares its identification assumptions via
AssumptionSet—no hidden requirements. -
Deterministic Benchmarks
Synthetic generators with fixed seeds produce identical results across runs.
-
Comprehensive Testing
Test suite covering estimators, diagnostics, and full pipeline integration.
-
Docs ↔ Code Parity
Automated checks keep formulas and APIs aligned with documentation.
Data Contracts¶
Use the dataset contracts in crl.data and follow the shape rules exactly:
| Data Type | Class | Use Case |
|---|---|---|
| Bandits | LoggedBanditDataset |
Single-step contextual decisions |
| Trajectories | TrajectoryDataset |
Episode-based sequential data |
| Transitions | TransitionDataset |
Step-by-step (s, a, r, s') tuples |
Learn by Example¶
-
Getting Started
-
Tutorials
-
Reference
-
Results
Estimator Selection¶
Not sure which estimator to use? See the Estimator Selection Guide for a practical decision tree and recommended defaults.