Skip to main content
Log in

Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors

  • Original Paper
  • Published:
TOP Aims and scope Submit manuscript

Abstract

In this paper we develop the convex analytic approach to a discounted discrete-time Markov decision process (DTMDP) in Borel state and action spaces with N constraints. Unlike the classic discounted models, we allow a non-constant discount factor. After defining and characterizing the corresponding occupation measures, the original constrained DTMDP is written as a convex program in the space of occupation measures, whose compactness and convexity we show. In particular, we prove that every extreme point of the space of occupation measures can be generated by a deterministic stationary policy for the DTMDP. For the resulting convex program, we prove that it admits a solution that can be expressed as a convex combination of N+1 extreme points of the space of occupation measures. One of its consequences is the existence of a randomized stationary optimal policy for the original constrained DTMDP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aliprantis C, Border K (2007) Infinite dimensional analysis. Springer, New York

    Google Scholar 

  • Altman E (1999) Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  • Bertsekas D, Shreve S (1978) Stochastic optimal control. Academic Press, New York

    Google Scholar 

  • Borkar V (1988) A convex analytic approach to Markov decision processes. Probab Theory Relat Fields 78:583–602

    Article  Google Scholar 

  • Chen R, Blankenship G (2004) Dynamic programming equations for discounted constrained stochastic control. IEEE Trans Autom Control 49:699–709

    Article  Google Scholar 

  • Dynkin E, Yushkevich A (1979) Controlled Markov processes. Springer, New York

    Book  Google Scholar 

  • Frid E (1972) On optimal strategies in control problems with constraints. Theory Probab Appl 17:188–192

    Article  Google Scholar 

  • González-Hernández J, Hernández-Lerma O (2005) Extreme points of sets of randomized strategies in constrained optimization and control problems. SIAM J Optim 15:1085–1104

    Article  Google Scholar 

  • González-Hernández J, López-Martińez R, Pérez-Hernández J (2007) Markov control processes with randomized discounted cost. Math Methods Oper Res 65:27–44

    Article  Google Scholar 

  • González-Hernández J, López-Martińez R, Minjárez-Sosa J (2009) Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion. Kybernetika 45:737–754

    Google Scholar 

  • Guo X, Piunovskiy A (2011) Discounted continuous-time Markov decision processes with constraints: unbounded transition and loss rates. Math Oper Res 36:105–132

    Article  Google Scholar 

  • Hernández-Lerma O, Lasserre J (1996) Discrete-time Markov control processes. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, Lasserre J (1999) Further topics on discrete-time Markov control processes. Springer, New York

    Book  Google Scholar 

  • Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285

    Article  Google Scholar 

  • Mao X, Piunovskiy A (2000) Strategic measures in optimal control problems for stochastic sequences. Stoch Anal Appl 18:755–776

    Article  Google Scholar 

  • Meyer P (1966) Probability and potentials. Blaisdell, Waltham

    Google Scholar 

  • Piunovskiy A (1997) Optimal control of random sequences in problems with constraints. Kluwer Academic, Dordrecht

    Book  Google Scholar 

  • Piunovskiy A (2000) Controlled random sequences: the convex analytic approach and constrained problems. Russ Math Surv 53:1233–1293

    Article  Google Scholar 

  • Piunovskiy A (2006) Dynamic programming in constrained Markov decision processes. Control Cybern 35:645–660

    Google Scholar 

  • Piunovskiy A, Mao X (2000) Constrained Markovian decision processes: the dynamic programming approach. Oper Res Lett 27:119–126

    Article  Google Scholar 

  • Prokhorov Yu (1956) Convergence of random processes and limit theorems in probability theory. Theory Probab Appl 1:157–214

    Article  Google Scholar 

  • Rockafellar R (1974) Conjugate duality and optimization. SIAM, Philadelphia

    Book  Google Scholar 

  • Schäll M (1975) Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z Wahrscheinlichkeitstheor Verw Geb 32:179–196

    Article  Google Scholar 

  • Varadarajan V (1958) Weak convergence of measures on separable metric spaces. Sankhya 19:15–22

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors. TOP 21, 378–408 (2013). https://doi.org/10.1007/s11750-011-0186-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11750-011-0186-8

Keywords

Mathematics Subject Classification (2000)

Navigation