Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors

Yi Zhang ^[1]
1. [1] University of Liverpool
  
  University of Liverpool
  
  Reino Unido
Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 21, Nº. 2, 2013, págs. 378-408
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- In this paper we develop the convex analytic approach to a discounted discrete-time Markov decision process (DTMDP) in Borel state and action spaces with N constraints. Unlike the classic discounted models, we allow a non-constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of N+1 extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.