Abstract
We give mild conditions for the existence of optimal solutions for a Markov decision problem with average cost, under m constraints of the same kind, in Borel actions and states spaces. Moreover, there is an optimal policy that is a convex combination of at most m+1 deterministic policies.
Similar content being viewed by others
References
Altman E (1999) Constraint Markov decision processes. Chapman & Hall/CRC, Boca Raton
Ash RB (1972) Real analysis and probability. Academic Press, London
Beutler FJ, Ross KW (1985) Optimal policies for controlled Markov chains with a constraint. J Math Anal Appl 112:236–252
Billingsley P (1995) Probability and measure, 3rd edn. Wiley–Interscience, New York
Billingsley P (1999) Convergence of probability measures, 2nd edn. Wiley–Interscience, New York
Borkar VS (1994) Ergodic control of Markov chains with constraints—the general case. SIAM J Control Optim 32:176–186
Bourbaki N (1969) Éléments de mathématique, chapitre IX. Hermann, Paris
Collins EJ, McNamara JM (1998) Finite-horizon dynamic optimization when the terminal reward is a concave function of the distribution of the final state. Adv Appl Probab 30:122–136
Feimberg EA, Shwartz A (1996) Constrained discounted dynamic programming. Math Oper Res 21:922–945
Frid EB (1972) On optimal strategies in control problems with constraints. Theory Probab Appl 17:188–192
González-Hernández J, Hernández-Lerma O (2005) Extreme points of sets randomized strategies in constrained optimization and control problems. SIAM J Optim 15(4):1085–1104
Haviv M (1996) On constrained Markov decision processes. Oper Res Lett 19(1):25–28
Hernández-Lerma O, González-Hernández J (2000) Constrained Markov control processes in Borel spaces: the discounted case. Math Methods Oper Res 52:271–285
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Hernández-Lerma O, González-Hernández J, López-Martínez RR (2003) Constrained average cost Markov control processes in Borel space. SIAM J Control Optim 42(2):442–468
Hinderer K (1970) Foundations of non-stationary dynamic programming discrete-time parameter. Lecture notes oper res math syst, vol 33. Springer, Berlin
Hu Q, Yue W (2008) Markov decision processes with their applications. Springer, New York
Karr AF (1983) Extreme points of certain sets of probability measures with applications. Math Oper Res 8:74–85
Kurano M, Nakagami J, Huang Y (2000a) Constrained Markov decision processes with compact state and action spaces: the average case. Optimization 48(2):255–269
Kurano M, Yasuda M, Nakagami J, Yoshida Y (2000b) A fuzzy treatment of uncertain Markov decision processes: average case. In Proceedings of ASSM2000 International Conference on Applied Stochastic System Modeling, Kyoto, pp 148–157
Loève M (1977) Probability theory, vol 1, 4th edn. Springer, New York
Munkres JR (1975) Topology: a first course. Prentice–Hall, New Jersey
Phelps RR (1966) Lectures on Choquet’s theorem. Van Nostrand, New York
Piunovskiy AB (1993) Control of random sequences in problems with constraints. Theory Probab Appl 38(4):751–762
Piunovskiy AB (1997) Optimal control of random sequences in problems with constraints. Kluwer Academic, Dordrecht
Piunovskiy AB, Khametov VM (1991) Optimal control by random sequences with constraints. Math Notes 49:654–656
Sennott LI (1991) Constrained discounted Markov decision chains. Probab Eng Inf Sci 5:463–475
Sennott LI (1993) Constrained average cost Markov decision chains. Probab Eng Inf Sci 7:69–83
Tanaka K (1991) On discounted dynamic programming with constraints. J Math Anal Appl 155:264–277
Winkler G (1988) Extreme points of moment sets. Math Oper Res 13:581–587
Yoshida K (1978) Functional analysis, 5th edn. Springer, Berlin
Yushkevich AA (1997) The compactness of a policy space in dynamic programming via an extension theorem for Carathéodory functions. Math Oper Res 22:458–467
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
González-Hernández, J., Villarreal, C.E. Optimal policies for constrained average-cost Markov decision processes. TOP 19, 107–120 (2011). https://doi.org/10.1007/s11750-009-0110-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11750-009-0110-7