Fitted Q Evaluation (FQE)¶
Implementation: crl.estimators.fqe.FQEEstimator
Assumptions¶
- Sequential ignorability
- Overlap/positivity
- Markov property
- Q-model realizability
Requires¶
TrajectoryDatasetbehavior_action_probsoptional (used only for diagnostics)
Diagnostics to check¶
model.q_model_msemodel.bellman_residual_mseoverlap.support_violations(if propensities provided)
Formula¶
FQE fits a Q-function by iterative Bellman regression on logged data, then estimates $V^\pi$ by averaging $\hat V(s_0)$ under the target policy.
Uncertainty¶
- Normal-approximation CI by default.
- Block bootstrap recommended for sequential dependence (
bootstrap=True).
Failure modes¶
- Extrapolation error for out-of-distribution state-action pairs.
- Sensitive to model capacity and optimization.
Minimal example¶
Bootstrap notes¶
- IID bootstrap ignores temporal dependence and can be optimistic.
- Trajectory or block bootstraps are preferable for sequential data.
References¶
- Le et al. (2019)
- Hao et al. (2021) for bootstrap inference