Abstract
In this paper we develop the convex analytic approach to a discounted discrete-time Markov decision process (DTMDP) in Borel state and action spaces with N constraints. Unlike the classic discounted models, we allow a non-constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of N+1 extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.
Similar content being viewed by others
References
Aliprantis C, Border K (2007) Infinite dimensional analysis. Springer, New York
Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton
Bertsekas D, Shreve S (1978) Stochastic optimal control. Academic Press, New York
Borkar V (1988) A convex analytic approach to Markov decision processes. Probab Theory Relat Fields 78:583–602
Chen R, Blankenship G (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Autom Control 49:699–709
Dynkin E, Yushkevich A (1979) Controlled Markov processes. Springer, New York
Frid E (1972) On optimal strategies in control problems with constraints. Theory Probab Appl 17:188–192
González-Hernández J, Hernández-Lerma O (2005) Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim 15:1085–1104
González-Hernández J, López-Martińez R, Pérez-Hernández J (2007) Markov control processes with randomized discounted cost. Math Methods Oper Res 65:27–44
González-Hernández J, López-Martińez R, Minjárez-Sosa J (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45:737–754
Guo X, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132
Hernández-Lerma O, Lasserre J (1996) Discrete-time Markov control processes. Springer, New York
Hernández-Lerma O, Lasserre J (1999) Further topics on discrete-time Markov control processes. Springer, New York
Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285
Mao X, Piunovskiy A (2000) Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl 18:755–776
Meyer P (1966) Probability and potentials. Blaisdell, Waltham
Piunovskiy A (1997) Optimal control of random sequences in problems with constraints. Kluwer Academic, Dordrecht
Piunovskiy A (2000) Controlled random sequences: the convex analytic approach and constrained problems. Russ Math Surv 53:1233–1293
Piunovskiy A (2006) Dynamic programming in constrained Markov decision processes. Control Cybern 35:645–660
Piunovskiy A, Mao X (2000) Constrained Markovian decision processes: the dynamic programming approach. Oper Res Lett 27:119–126
Prokhorov Yu (1956) Convergence of random processes and limit theorems in probability theory. Theory Probab Appl 1:157–214
Rockafellar R (1974) Conjugate duality and optimization. SIAM, Philadelphia
Schäll M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheor Verw Geb 32:179–196
Varadarajan V (1958) Weak convergence of measures on separable metric spaces. Sankhya 19:15–22
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, Y. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP 21, 378–408 (2013). https://doi.org/10.1007/s11750-011-0186-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11750-011-0186-8