Skip to content

Estimands

Estimands for causal reinforcement learning.

PolicyContrastEstimand dataclass

Contrast between two policy values.

Estimand

V^{pi_treatment} - V^{pi_control}.

Assumptions: Same as PolicyValueEstimand for both policies. Inputs: treatment: Target policy value estimand. control: Control policy value estimand. Outputs: Contrast specification used by estimators or reports. Failure modes: If assumptions differ, the contrast may not be identified.

to_dict()

Return a dictionary representation.

PolicyValueEstimand dataclass

Policy value estimand under intervention.

Estimand

V^pi = E[sum_t gamma^t R_t | do(A_t ~ pi(\cdot | S_t))].

Assumptions: Sequential ignorability, positivity/overlap, and correct data contract. Inputs: policy: Target policy. discount: Discount factor. horizon: Optional horizon for finite episodes. assumptions: AssumptionSet describing identification conditions. Outputs: Estimand specification used by estimators. Failure modes: If required assumptions are missing, estimators should refuse to run.

require(names)

Require that assumptions include the specified names.

to_dict()

Return a dictionary representation of the estimand.

ProximalPolicyValueEstimand dataclass

Policy value estimand identified under proximal assumptions.

require(names)

Require that assumptions include the specified names.

SensitivityPolicyValueEstimand dataclass

Policy value estimand under a gamma-bounded confounding model.

compute_bounds(dataset)

Compute sensitivity bounds for the provided dataset.

require(names)

Require that assumptions include the specified names.