Assumptions and Identification¶

CausalRL requires explicit assumptions. These determine which estimators are valid and how to interpret results.

Core assumptions¶

Sequential ignorability: actions are conditionally independent of potential outcomes given observed history.
Overlap (positivity): the behavior policy assigns nonzero probability to actions that the target policy may take.
Markov (MDPs): the future depends only on the current state and action.
Behavior policy known (BEHAVIOR_POLICY_KNOWN): propensities are known or correctly specified.
Q-model realizability: value function lies in the chosen model class.
Bridge identifiability: proximal bridge functions are well-posed.
Bounded rewards: required for concentration bounds.

Assumptions are the bridge from data to causal statements. Without them, the estimate is a number without a guarantee.