Ir al contenido

Documat


Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors

  • Yi Zhang [1]
    1. [1] University of Liverpool

      University of Liverpool

      Reino Unido

  • Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 21, Nº. 2, 2013, págs. 378-408
  • Idioma: inglés
  • Enlaces
  • Resumen
    • In this paper we develop the convex analytic approach to a discounted discrete-time Markov decision process (DTMDP) in Borel state and action spaces with N constraints. Unlike the classic discounted models, we allow a non-constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of N+1 extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno