Ir al contenido

Documat


A Moreau-Yosida regularization for Markov decision processes

  • Ortega-Gutiérrez, Israel [1] ; Cruz-Suárez, Hugo [1]
    1. [1] Benemérita Universidad Autónoma de Puebla.
  • Localización: Proyecciones: Journal of Mathematics, ISSN 0716-0917, ISSN-e 0717-6279, Vol. 40, Nº. 1, 2021, págs. 117-137
  • Idioma: inglés
  • DOI: 10.22199/issn.0717-6279-2021-01-0008
  • Enlaces
  • Resumen
    • This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.

  • Referencias bibliográficas
    • C. D. Aliprantis and K. C. Border, Infinite dimensional analysis: a hitchhikers guide. Berlin: Springer, 2006, doi: 10.1007/3-540-29587-9
    • A. Belloni, Lecture notes for IAP 2005 course introduction to Bundle methods, 2005. [On line]. Available: https://bit.ly/3qbJfmb
    • D. P. Bertsekas, Dynamic programming and stochastic control. New York, NY: Academic Press, 1976.
    • J. M. Borwein and Q. J. Zhu, Techniques of variational analysis. New York: Springer, 2005.
    • P. G. Ciarlet, B. Miara, and J. M. Thomas, Introduction to Numerical Linear Algebra and Optimisation. New York, NY: Cambridge University Press,...
    • D. Cruz-Suárez, R. Montes-de-Oca, and F. Salem-Silva, “Conditions for the uniqueness of optimal policies of discounted Markov decision processes”,...
    • J. Dugundji, Topology. Boston, MA: Allyn and bacon, 1966. [On line]. Available: https://bit.ly/398XKQA
    • I. Ɖuranović-Miličić and M. Gardašević-Filipović, “On an Algorithm in nondifferential convex optimization”, Yugoslav journal of operations...
    • S. H. Friedberg, A. Insel, and L. Spence, Linear algebra. Upper Saddle River, NJ: Pearson, 2003.
    • M. Geist, B. Scherrer, O. Pietquin, “Theory of Regularized Markov Decision Processes”, Proceedings of machine learning research , vol. 97,...
    • O. Hernández-Lerma, Adaptive Markov control processes. New York, NY: Springer, 1989.
    • O. Hernández-Lerma and J. B. Lasserre, Markov control processes: basic optimality criteria: discrete-time. New York, NY: Springer, 1996.
    • O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov control processes. New York, NY: Springer, 1999.
    • C. Lemaréchal and C. Sagastizábal, “Practical aspects of the Moreau-Yosida regularization: theoretical preliminaries”, SIAM journal on optimization,...
    • R. T. Rockafellar, Convex analysis. Princeton, NJ: Princeton University Press, 1970.
    • R. T. Rockafellar and R. J.-B. Wets, Variational analysis. Heidelberg: Springer, 2004.
    • H. L. Royden, Real analysis. New York, NY: Macmillan, 1988.

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno