Skip to content

Assumptions and Identification

CausalRL requires explicit assumptions. These determine which estimators are valid and how to interpret results.

Core assumptions

  • Sequential ignorability: actions are conditionally independent of potential outcomes given observed history.
  • Overlap (positivity): the behavior policy assigns nonzero probability to actions that the target policy may take.
  • Markov (MDPs): the future depends only on the current state and action.
  • Behavior policy known (BEHAVIOR_POLICY_KNOWN): propensities are known or correctly specified.
  • Q-model realizability: value function lies in the chosen model class.
  • Bridge identifiability: proximal bridge functions are well-posed.
  • Bounded rewards: required for concentration bounds.

What estimators require

  • IS/WIS/PDIS: sequential ignorability + overlap.
  • DR/WDR/MRDR/MAGIC: sequential ignorability + overlap + Markov.
  • DRL: sequential ignorability + overlap + Markov.
  • FQE: sequential ignorability + overlap + Markov + Q-model realizability.
  • MIS: sequential ignorability + overlap + Markov.
  • DualDICE/GenDICE: sequential ignorability + Markov + discrete state support.
  • HCOPE: sequential ignorability + overlap + bounded rewards.

Why this matters

Assumptions are the bridge from data to causal statements. Without them, the estimate is a number without a guarantee.