Abstract
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games.
Similar content being viewed by others
References
Albright S.C. and Winston W. (1979). A Birth-Death Model of Advertising and Pricing.Advances in Applied Probability 11, 134–152.
Allen L.J.S. (2003).An Introduction to Stochastic Processes with Applications to Biology. Pearson Education.
Anderson W.J. (1991).Continuous-Time Markov Chains. Springer.
Bailey N.T.J. (1975).The Mathematical Theory of Infectious Diseases and Its Applications. Griffin.
Bartholomew D.J. (1973).Stochastic Models for Social Processes, 2nd Edition. Wiley.
Bellman R. (1957).Dynamic Programming. Princeton University Press.
Berge C. (1963).Topological Spaces. Macmillan.
Bertsekas D.P. (2001).,Dynamic Programming and Optimal Control, Vol. II, 2nd. Edition. Athena Scientific.
Blackwell D. (1962). Discrete Dynamic Programming.Annals of Mathematical Statistics 33, 719–726.
Cao X.R. (2005). Basic Ideals for Event-Based Optimality of Markov, Systems.Discrete Event Dynamic Systems: Theory and Applications 15, 169–197.
Cao X.R. and Guo X.P. (2006). Continuous-Time Markov Decision Processes withn-Potential Optimality Criteria. Preprint.
Cao X.R. and Zhang J.Y. (2007). Then-th Order Bias Optimality for Multi-Chain Markov Decsiion Processes.IEEE Transactions on Automatic Control. In press.
Dekker R. and Hordijk A. (1988). Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards.Mathematics of Operations Research 13, 395–420.
Dekker R. and Hordijk A. (1992). Recurrence Conditions for Average and Black-well Optimality in Denumerable State Markov Decision Chains.Mathematics of Operations Research 17, 271–289.
Doshi B.T. (1976). Continuous-Time Control of Markov Processes on an Arbitrary State Space: Discounted Rewards.Annals of Statistics 4, 1219–1235.
Dynkin E.B. and Yushkevich A.A. (1979).Controlled Markov Processes. Springer.
Feinberg E.A. and Shwartz A. (2002).Handbook of Markov Decision Processes. Kluwer.
Feller W. (1940). On the Integro-Differential Equations of Purely Discontinuous Markoff Processes.Transactions of the American Mathematical Society 48, 488–515.
Fisher L. (1968). On the Recurrent Denumerable Decision Process.Annals of Mathematical Statistics 39, 424–434.
Gale D. (1967). On Optimal Development in a Multi-Sector Economy.Review of Economic Studies 34, 1–19.
Guo X.P. (2006). Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces.Mathematics of Operations Research (to appear).
Guo X.P. and Cao X.R. (2005). Optimal Control of Ergodic Continuous-Time Markov Chains with Average Sample-Path Rewards.SIAM Journal on Control and Optimization 44, 29–48.
Guo X.P. and Hernández-Lerma O. (2003a). Continuous-Time Controlled Markov Chains.Annals of Applied Probability 13, 363–388.
Guo X.P. and Hernández-Lerma O. (2003b). Continuous-Time Controlled Markov Chains with Discounted Rewards.Acta Applicandae Mathematicae 79, 195–216.
Guo X.P. and Hernández-Lerma O. (2003c). Drift and Monotonicity Conditions for Continuous-Time Controlled Markov Chains with an Average Criterion.IEEE Transactions on Automatic Control 48, 236–245.
Guo X.P. and Hernández-Lerma O. (2003d). Zero-Sum Games for Continuous-Time Markov Chains with Unbounded Transition and Average Payoff Rates.Journal of Applied Probability 40, 327–345.
Guo X.P. and Hernández-Lerma O. (2005a). Nonzero-Sum Games for Continuous-Time Markov Chains with Unbounded Discounted Payoffs.Journal of Applied Probability 42, 302–320.
Guo X.P. and Hernández-Lerma O. (2005b). Zero-Sum Continuous-Time Markov Games with Unbounded Transition and Discounted Payoff Rates.Bernoulli 16, 1009–1029.
Guo X.P. and Liu K. (2001). A Note on Optimality Conditions for Continuous-Time Markov Decision Processes with Average Cost Criterion.IEEE Transactions on Automatic Control 46, 1984–1989.
Guo X.P. and Rieder U. (2006). Average Optimality for Continuous-Time Markov Decision Processes in Polish Spaces.Annals of Applied Probability 16, 730–756.
Guo X.P. and Zhu W.P. (2002a). Denumerable State Continuous-Time Markov Decision Processes with Unbounded Cost and Transition Rates Under the Discounted Criterion.Journal of Applied Probability 39, 233–250.
Guo X.P. and Zhu W.P. (2002b). Denumerable State Continuous-Time Markov Decision Processes with Unbounded Cost and Transition Rates Under Average Criterion.ANZIAM Journal 43, 541–557.
Haviv M. and Puterman M.L. (1998). Bias Optimality in Controlled Queuing Systems.Journal of Applied Probability 35, 136–150.
Hernández-Lerma O. (1994).Lectures on Continuous-Time Markov Control Processes. Aportaciones Matemáticas, Vol. 3, Sociedad Matemática Mexicana, Mexico City.
Hernández-Lerma O. and Govindan T.E. (2001). Nonstationary Continuous-Time Markov Control Processes with Discounted Costs on Infinite Horizon.,Acta Applicandae Mathematicae 67, 277–293.
Hernández-Lerma O. and Lasserre J.B. (1996).Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer.
Hernández-Lerma O. and Lasserre J.B. (1999).Further Topics on Discrete-Time Markov Control Processes. Springer.
Hernández-Lerma O. and Romera R. (2004). The Scalarization Approach to Multi-objective Markov Control Problems: Why Does it Work?Applied Mathematics and Optimization 50, 279–293.
Hilgert N. and Hernández-Lerma O. (2003). Bias Optimality Versus Strong 0-Discount Optimality in Markov Control Processes with Unbounded Costs.Acta Applicandae Mathematicae 76, 215–235.
Hordijk A. and Yushkevich A.A. (1999a). Blackwell Optimality in the Class of Stationary Policies in Markov Decision Chains with a Borel State and Unbounded Rewards.Mathematical Methods of Operations Research 49, 1–39.
Hordijk A. and Yushkevich A.A. (1999b). Blackwell Optimality in the Class of All Policies in Markov Decision Chains with a Borel State and Unbounded Rewards.Mathematical Methods of Operations Research 50, 421–448.
Hordijk A. and Yushkevich A.A. (2002). Blackwell Optimality. In: Feinberg E.A. and Shwartz A. (eds.),Handbook of Markov Decision Processes. Kluwer, 231–267.
Hou Z.T. and Guo X.P. (1998).Markov Decision Processes. Science and Technology Press of Human, Changsha, China. (In Chinese.)
Howard R.A. (1960).Dynamic Programming and Markov Processes, Wiley.
Hu Q. (1992). Discounted and Average Markov Decision Processes with Unbounded Rewards: New Conditions.Journal of Mathematical Analysis and Applications 171, 111–124.
Hu Q. (1996). Continuous Time Markov Decision Processes with Discounted Moment Criterion.Journal of Mathematical Analysis and Applications 203, 1–12.
Iosifescu M. and Tautu P. (1973).Stochastic Processes and Applications in Biology and Medicine, Vol. II: Models. Springer.
Jaskiewicz A. (2004). On the Equivalence of Two Expected Average Cost Criteria for Semi-Markov Control Processes.Mathematics of Operations Research 29, 326–338.
Kakumanu P. (1971). Continuously Discounted Markov Decision Models with Countable State and Action Spaces.Annals of Mathematical Statistics 42, 919–926.
Kakumanu P. (1972). Nondiscounted Continuous-Time Markov Decision Processes with Countable State and Action Spaces.SIAM Journal on Control 10, 210–220.
Kakumanu P. (1975). Continuous Time Markov Decision Processes with Average Return Criterion.Journal of Mathematical Analysis and Applications 52, 173–188.
Kakumanu P. (1977). Relation Between Continuous and Discrete Markovian Decision Problems.Naval Research Logistics Quarterly 24, 431–439.
Kato T. (1966).Perturbation Theory for Linear Operators. Springer.
Kermack W.O. and McKendrick A.G. (1927). Contributions to the Mathematical Theory of Epidemics.Proceedings of the Royal Statistical Society A115, 700–721.
Kitayev M.Yu. (1985). Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion.Theory of Probability and its Applications 30, 272–288.
Kitayev M.Yu. and Rykov V.V. (1995).Controlled Queueing Systems. CRC Press.
Lasserre J.B. (1988). Conditions for the Existence of Average and Blackwell Optimal Stationary Policies in Denumerable Markov Decision Processes.Journal of Mathematical Analysis and Applications 136, 479–490.
Lefèvre C. (1979). Optimal Control of the Simple Stochastic Epidemic with Variable Recovery Rates.Mathematical Biosciences 44, 209–219.
Lefèvre C. (1981). Optimal Control of a Birth and Death Epidemic Process.Operations Research 29, 971–982.
Leizarowitz A. (1996). Overtaking and Almost-Sure Optimality for Infinite Horizon Markov Decision Processes.Mathematics of Operations Research 21, 158–181.
Lembersky M.R. (1974). On Maximal Rewards and ∈-Optimal Policies in Continuous Time Markov Chains.Annals of Statistics 2, 159–169.
Lewis M.E., Ayhan H. and Foley R.D. (1999). Bias Optimality in a Queue with Admission Control.Probability in the Engineering and Informational Sciences 13, 309–327.
Lewis M.E., Ayhan H. and Foley R.D. (2002). Bias Optimal Admission Policies for a Noustationary Multiclass Queueing System.Journal of Applied Probability 39, 20–37.
Lewis M.E. and Puterman M.L. (2001). A Note on Bias Optimality in Controlled Queueing Systems.Journal of Applied Probability 37, 300–305.
Lewis M.E. and Puterman M.L. (2002). A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes.IEEE Transactions on Automatic Control 46, 96–100.
Lippman S.A. (1975). Applying a New Device in the Optimization of Exponential Queueing Systems.Operations Research 23, 667–710.
Lund R.B., Meyn S.P. and Tweedie R.L. (1996). Computable Exponential Convergence Rates for Stochastically Ordered Markov Processes.Annals of Applied Probability 6, 218–237.
Mangel M. (1985).Decision and Control in Uncertain Resource Systems. Academic Press.
Massy W.F., Montgomery D.B. and Morrison D.G. (1970).Stochastic Models of Buying Behavior. MIT Press.
Meyn S.P. and Tweedie R.L. (1993). Stability of Markovian Processes III: Foster-Lyapunov Criteria for Continuous-Time Processes.Advances in Applied Probability, 25, 518–548.
Miller B.L. (1968). Finite State Continuous Time Markov Decision Processes with an Infinite Planning Horizon.Journal of Mathematical Analysis and Applications 22, 552–569.
Miller B.L. and Veinott A.F. (1969). Discrete Dynamic Programming with a Small Interest Rate.Annals of Mathematical Statistics 40, 366–370.
Piunovskii A.B. (1998). A Controlled Jump Discounted Model with Constraints.Theory of Probability and Its Applications 42, 51–72.
Piunovskii A.B. (2004). Multicriteria Impulsive Control of Jump Markov Processes.Mathematical Methods of Operations Research 60, 125–144.
Prieto-Rumeau T. (2006). Blackwell Optimality in the Class of Markov Policies for Continuous-Time Controlled Markov Chains.Acta Applicandae Mathematicae 92, 77–96.
Prieto-Rumeau T. and Hernández-Lerma O. (2005a). The Laurent Series, Sensitive Discount and Blackwell Optimality for Continuous-Time Controlled Markov Chains.Mathematical Methods of Operations Research 61, 123–145.
Prieto-Rumeau T. and Hernández-Lerma O. (2005b). Bias and Overtaking Equilibria for Zero-Sum Continuous-Time Markov Games.Mathematical Methods of Operations Research 61, 437–454.
Prieto-Rumeau T. and Hernández-Lerma O. (2006a). Bias Optimality for Continuous-Time Controlled Markov Chains.SIAM Journal on Control and Optimization 45, 51–73.
Prieto-Rumeau T. and Hernández-Lerma O (2006b). A Unified Approach to Continuous-Time Discounted Markov Control Processes.Morfismos 10 (to appear).
Prieto-Rumeau T. and Hernández-Lerma O. (2006c). Ergodic Control of Continuous-Time Markov Chains with Pathwise Constraints. Preprint.
Prieto-Rumeau T. and Hernández-Lerma O. (2006d). Variance Minimization and the Overtaking Optimality Approach to Continuous-Time Markov Control Chains. Preprint.
Puterman M.L. (1974). Sensitive Discount Optimality in Controlled One-Dimensional Diffusions.Annals of Probability 2, 408–419.
Puterman M.L. (1994).Markov Decision Processes. Wiley.
Qiu Q., Wu Q. and Pedram M. (2001). Stochastic Modeling of a Power-Managed System: Construction and Optimization.IEEE Transactions on Computer Aided Design 20, 1200–1217.
Ramsey F.P. (1928). A Mathematical Theory of Savings.Econometrics Journal 38, 543–559.
Ross S.M. (1970).Applied Probability Models with Optimization Applications. Holden-Day.
Roykov V.V. (1966). Markov Sequential Decision Processes with Finite State and Decision Space.Theory of Probability and Its Applications, 11, 302–311.
Schäl M. (1992). On the Second Optimality Equation for Semi-Markov Decision Models.Mathematics of Operations Research 17, 470–486.
Sennott L.I. (1999).Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley.
Serfozo R.F. (1979). An Equivalence Between Continuous and Discrete Time Markov Decision Processes.Operations Research 27, 616–620.
Sladký K. (1978). Sensitive Optimality Criteria for Continuous Time Markov Processes.Transactions of the Eighth Prague Conference on Information Theory Statistical Decision Functions and Random Processes (Prague, 1978), Vol. B, 221–225.
Song, J.S. (1987). Continuous-Time Markov Decision Programming with Non-Uniformly Bounded Transition Rates.Scientia Sinica 12, 1258–1267. (in Chinese).
Tadj L and Choudhury G. (2005). Optimal Design and Control of Queues.Top 13, 359–412.
Taylor H.M. (1976). A Laurent Series for the Resolvent of a Strongly Continuous Stochastic Semi-Group.Mathematical Programming Study 6, 258–263.
Veinott A.F. (1966). On Finding Optimal Policies in Discrete Dynamic Programming with no Discounting.annals of Mathematical Statistics 37, 1284–1294.
Veinott A.F. (1969). Discrete Dynamic Programming with Sensitive Discount Optimality Criteria.Annals of Mathematical Statistics 40, 1635–1660.
Vidale M.L and Wolfe H.B. (1957). An Operations Research Study of Sales Response to Advertising.Operations Research 5, 370–381.
von Weizsäcker C.C. (1965). Existence of Optimal Programs of Accumulation for an Infinite Horizon.Review of Economic Studies 32, 85–104.
Wickwire K. (1977). Mathematical Models for the Control of Pests and Infectious Diseases: A Survey.Theoretical Population Biology 11, 182–238.
Wu C.B. (1997). Continuous Time Markov Decision Processes with Unbounded Reward and Non-Uniformly Bounded Transition Rate Under Discounted Criterion.Acta Mathematicae Applicandae Sinica 20, 196–208.
Ye L., Guo X.P. and Hernández-Lerma O. (2006). Existence and Regularity of NonhomogeneousQ(t)-Processes under Measurability Conditions. Preprint.
Yosida K. (1980).Functional Analysis, Sixth Edition. Springer.
Yushkevich A.A. (1973). On a Class of Strategies in General Markov Decision Models.Theory of Probability and Its Applications 18, 777–779.
Yushkevich A.A. (1977). Controlled Markov Models with Countable State and Continuous Time.Theory of Probability and its Applications 22, 215–235.
Yushkevich A.A. (1994). Blackwell Optimal Policies in a Markov Decision Process with a Borel State Space.Mathematical Methods of Operations Research 40, 253–288.
Yushkevich A.A. (1997). Blackwell Optimality in Continuous in Action Markov Decision Processes.SIAM Journal on Control and Optimization 35, 2157–2182.
Yushkevich A.A. and Feinberg E.A. (1979). On Homogeneous Markov Model with Continuous Time and Finite or Countable State Space.Theory of Probability and its Applications 24, 156–161.
References
Bather J. (1976). Optimal Stationary Policies for Denumerable Markov Chains in Continuous Time.Advances in Applied Probability 8, 148–155.
Cao X.-R. (2003a). Semi-Markov Decision Problems and Performance Sensitivity Analysis.IEEE Transactions on Automatic Control 48, 758–769.
Cao X.-R. (2003b) A Sensitivity View of Markov Decision Processes and Reinforcement Learning. In: Gong W. and Shi L. (eds.),Modeling, Control and Optimization of Complex systems, Kluwer, 261–283.
Cao X.-R. (2003c). From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning.Discrete Event Dynamic Systems 13, 9–39.
Cao X.-R. (2004). The Potential Structure of Sample Paths and Performance Sensitivities of Markov Systems.IEEE Transactions on Automatic Control 49, 2129–2142.
Cao X.-R. (2005). Basic Ideas for Event-Based Optimality of Markov Systems.Discrete Event Dynamic Systems: Theory and Applications 15, 169–197.
Cao X.-R. and Zhang J.Y. (2007). Thenth-Order Bias Optimality for Multichain Markov Decision Process.IEEE Transactions on Automatic Control (to appear).
Dijk N.V. (1993).Queueing Networks and Product Forms: A System Approach. Wiley.
Dynkin E.B. and Yushkevich A.A. (1979).Controlled markov Processes. Springer.
Puterman M.L. (1994).Markov Decision Processes. Wiley.
Sennott L.I. (1999).Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley.
References
Hu Q. (1990). CTMDP and Its Relationship with DTMDP.Chinese Science Bulletin 35, 710–714.
Hu Q. (1992). Discounted and Average Markov Decision Processes with Unbounded Rewards: New Conditions.Journal of Mathematical Analysis and Applications 171, 111–124.
Hu Q. (1996). Continuous Time Markov Decision Processes with Discounted Moment Criterion.Journal of Mathematical Analysis and Applications 203, 1–12.
Hu Q., Liu J. and Yue W. (2003). Continuous Time Markov Decision Processes: Discounted Total Reward.International Journal of Pure and Applied Mathematics 7, 147–175.
Hu Q. and Wang J. (1998). Continuous Time Markov Decision Processes with Nonuniformly Bounded Rate: Expected Total Rewards.Optimization 43, 219–233.
Serfozo R.F. (1979). An Equivalence Between Continuous and Discrete Time Markov Decision Processes.Operations Research 27, 616–620.
References
Haviv M. and Puterman M.L. (1998). Bias Optimality in Controlled Queuing Systems.Journal of Applied Probability 35, 136–150.
Lewis M.E., Ayhan H. and Foley R.D. (1999). Bias Optimality in a Queue with Admission Control.Probability in the Engineering and Informational Sciences 13, 309–327.
Lewis M.E., Ayhan H. and Foley R.D. (2002). Bias Optimal Admission Policies for a Nonstationary Multiclass Queueing System.Journal of Applied Probability 39, 20–37.
References
Borkar V.S. (2004). Controlled Diffusion Processes.Probability Surveys 2, 213–244.
Cao X.R. and Guo X.P. (2004). Partially Observable Markov Decision Processes With Reward Information Proceeding of 43rd IEEE Conference on Decision and Control, 4393–4398.
Hernández-Lerma O. (1989).Adaptive Markov Control Processes. Springer.
Hernández-Lerma O. and Lasserre J.B. (1996).Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer.
Hernández-Lerma O. and Lasserre J.B. (1999).Further Topics on Discrete-Time Markov Control Processes. Springer.
Howard R.A. (1960).Dynamic Programming and Markov Processes. MIT Press.
Kaelbling L.P., Littman M.L. and Moore A.W. (1996). Reinforcement Learning: A Survey,Journal of Artificial Intelligence Research 4, 237–285.
Puterman M.L. (1994).Markov Decision Processes. Wiley.
Neyman A. and Sorin S. (2001).Stochastic Games and Applications. NATO Science Series, 570.
Author information
Authors and Affiliations
Additional information
Research partially supported by grants NSFC, DRFP and NCET.
Research partially supported by CONACyT (Mexico) Grant 45693-F.
Rights and permissions
About this article
Cite this article
Guo, X., Hernández-Lerma, O., Prieto-Rumeau, T. et al. A survey of recent results on continuous-time Markov decision processes. TOP 14, 177–261 (2006). https://doi.org/10.1007/BF02837562
Issue Date:
DOI: https://doi.org/10.1007/BF02837562
Key Words
- Continuous-time Markov decision processes (also known as controlled Markov chains)
- unbounded reward and transition rates
- discounted reward
- average reward
- bias optimality
- sensitive discount criteria