Core Concepts¶
This page summarizes the core objects you will use in CausalRL and how they connect in the OPE workflow. All names below map to real classes/functions in the package.
Datasets¶
CausalRL expects explicit dataset objects with validated shapes:
LoggedBanditDataset(crl.data) for contextual bandits.TrajectoryDataset(crl.data) for finite-horizon MDP trajectories.TransitionDataset(crl.data) for (s, a, r, s', done) logs that can be grouped into trajectories.
Each dataset exposes .discount, .horizon, and optional
behavior_action_probs for logged propensities.
Policies¶
Policies implement action probabilities for observed actions:
Policyprotocol (crl.policies) definesaction_probsandaction_prob.- Use
Policy.from_sklearnorPolicy.from_torchto wrap existing models. - Reference implementations include
TabularPolicy,StochasticPolicy, andCallablePolicy.
Estimands and assumptions¶
The central estimand is the policy value:
PolicyValueEstimandbinds the target policy with horizon, discount, and anAssumptionSet.- Assumptions are explicit via
AssumptionandAssumptionSet.
Estimators¶
Estimators return an EstimatorReport with the value estimate, standard
error, confidence interval, diagnostics, and warnings. Examples include:
- Importance sampling:
ISEstimator,WISEstimator,PDISEstimator - Doubly robust:
DoublyRobustEstimator,WeightedDoublyRobustEstimator - Model-based:
FQEEstimator - Specialized:
MAGICEstimator,MRDREstimator,DualDICEEstimator
Diagnostics and reports¶
Diagnostics quantify overlap, effective sample size, and tail behavior of
importance weights. Reports are aggregated in an OpeReport, which can
render tables and HTML outputs.
End-to-end pipeline¶
For most workflows you can use evaluate:
This selects default estimators for the dataset type and returns an OpeReport
with diagnostics and metadata.