MDP OPE Tutorial (Script)¶
This mirrors examples/quickstart/mdp_ope.py and focuses on trajectory-based
estimators in a finite-horizon MDP.
Run the script¶
What it does¶
- Generates synthetic trajectories with known ground truth.
- Declares Markov and overlap assumptions.
- Compares IS, WIS, PDIS, DR, and FQE.
- Prints a summary table for quick inspection.
Interpretation tips¶
- Horizon effects: ESS often decays over time.
- DR/FQE: lower variance if the value model is well specified.
- PDIS: more stable than full-trajectory IS in long horizons.
Next steps¶
- See the long-horizon comparison notebook:
notebooks/11_mdp_long_horizon_comparison.ipynb - Review estimator selection guidance: Estimator Selection Guide