CausalRL¶

Estimand-first causal RL and off-policy evaluation¶

Know what you're estimating. Know when to trust it. Know how it was produced.

Package: causalrl · Import: crl · Version: 0.2.0 · GitHub

Why CausalRL?¶

Estimand-first Design

Every estimator is tied to a formal estimand with explicit identification assumptions. Know what you're estimating.
Diagnostics by Default

Overlap, ESS, weight tails, and shift checks run automatically. Know when to trust your estimates.
20+ Estimators

IS, WIS, DR, WDR, MAGIC, MRDR, MIS, FQE, DualDICE, GenDICE, DRL—all in a unified pipeline.
Sensitivity Analysis

Bounded-confounding curves for bandits and sequential settings. Quantify robustness to hidden confounders.
D4RL & RL Unplugged

Built-in adapters for standard RL benchmarks. Load datasets with one line of code.
Audit-Ready Reports

HTML reports with tables, figures, and full metadata bundles. Share reproducible results.
Ground-Truth Benchmarks

Synthetic bandit/MDP suites with known true values. Validate estimators before deployment.
Production Ready

Type-checked, tested, with deterministic seeding throughout. Built for research reliability.

60-Second Quickstart¶

InstallationFirst EvaluationCLI

# Install from PyPI
pip install causalrl

# With all optional extras
pip install "causalrl[all]"

# Or install from source
git clone https://github.com/gsaco/causalrl
cd causalrl
pip install -e .

from crl.benchmarks.bandit_synth import SyntheticBandit, SyntheticBanditConfig
from crl.ope import evaluate_ope

# Create benchmark with known ground truth
benchmark = SyntheticBandit(SyntheticBanditConfig(seed=0))
dataset = benchmark.sample(num_samples=1000, seed=1)

# Run end-to-end evaluation
report = evaluate_ope(dataset=dataset, policy=benchmark.target_policy)

# View results and generate report
print(report.summary_table())
report.save_html("report.html")

# Bandit OPE demo
python -m examples.quickstart.bandit_ope

# MDP evaluation
python -m examples.quickstart.mdp_ope

# Full benchmark suite
python -m experiments.run_benchmarks --suite all --out results/

Scope

The current evaluate pipeline assumes discrete action spaces for importance sampling estimators. See Limitations for details on continuous actions.

The Three Pillars¶

Pillar	Why It Matters	What You Get
Estimands	Know what quantity you're estimating—not just which estimator	Explicit estimands with identification assumptions via `AssumptionSet`
Diagnostics	Know when an estimate is fragile before acting on it	Overlap checks, ESS, weight tails, shift diagnostics, sensitivity curves
Evidence	Know how results were produced for auditing	Versioned configs, deterministic seeds, structured report bundles

Results Gallery¶

See the full Results Gallery

Why Trust CausalRL?¶

Explicit Assumptions

Every estimand declares its identification assumptions via AssumptionSet—no hidden requirements.
Deterministic Benchmarks

Synthetic generators with fixed seeds produce identical results across runs.
Comprehensive Testing

Test suite covering estimators, diagnostics, and full pipeline integration.
Docs ↔ Code Parity

Automated checks keep formulas and APIs aligned with documentation.

Data Contracts¶

Use the dataset contracts in crl.data and follow the shape rules exactly:

Data Type	Class	Use Case
Bandits	`LoggedBanditDataset`	Single-step contextual decisions
Trajectories	`TrajectoryDataset`	Episode-based sequential data
Transitions	`TransitionDataset`	Step-by-step (s, a, r, s') tuples

Dataset Format and Validation

Learn by Example¶

Getting Started

Installation · Quickstart
Tutorials

Notebook Walkthroughs · Diagnostics
Reference

Estimator Reference · Public API
Results

Gallery · Sample HTML Report

Estimator Selection¶

Not sure which estimator to use? See the Estimator Selection Guide for a practical decision tree and recommended defaults.