Precup, D. (2000). Eligibility Traces for Off-Policy Policy Evaluation. PhD thesis, University of Massachusetts Amherst.
Mahmood, A. R., Yu, H., Sutton, R. S., and Szepesvari, C. (2014). Weighted Importance Sampling for Off-Policy Learning with Linear Function Approximation. NeurIPS.
Jiang, N. and Li, L. (2016). Doubly Robust Off-Policy Value Evaluation for Reinforcement Learning. ICML.
Thomas, P. S. and Brunskill, E. (2016). Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. ICML.
Farajtabar, M., Chow, Y., Ghavamzadeh, M., Pineau, J., and Precup, D. (2018). More Robust Doubly Robust Off-Policy Evaluation. ICML.
Xie, T., Ma, Y., Wang, Y., and Xie, L. (2019). Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling. NeurIPS.
Nachum, O., Schuurmans, D., and Liu, Y. (2019). DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections. NeurIPS.
Kallus, N. and Uehara, M. (2020). Double Reinforcement Learning for Off-Policy Evaluation. NeurIPS.
Thomas, P. S., Theocharous, G., and Ghavamzadeh, M. (2015). High-Confidence Off-Policy Evaluation. AAAI.
Hao, B., Ji, X., Duan, Y., Lu, H., Szepesvari, C., and Wang, M. (2021). Bootstrapping Statistical Inference for Off-Policy Evaluation. arXiv.
Zhang, R., Zhang, X., Ni, C., and Wang, M. (2022). Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory. ICML.
Uehara, M., Kiyohara, T., Fujimoto, S., and Hachiya, H. (2022). A Taxonomy of Off-Policy Evaluation. arXiv.