Overlap (Positivity)ΒΆ
Assumption - The behavior policy assigns non-zero probability to actions the target policy takes.
Applies to - IS, WIS, PDIS, DR, WDR, MAGIC, MRDR, MIS, DRL, HCOPE, Double RL. - FQE uses overlap for diagnostics but can run without propensities.
Definition - For all states with positive probability under the target policy, the behavior policy probability of those actions is bounded away from zero.
Diagnostics - Overlap metrics and weight tail diagnostics.
Failure modes - Importance weights explode and variance dominates.
References - Robins, Hernan, Brumback (2000).