A survey of recent results on continuous-time Markov decision processes

Guo, Xianping; Hernández-Lerma, Onésimo; Prieto-Rumeau, Tomás; Cao, Xi-Ren; Zhang, Junyu; Hu, Qiying; Lewis, Mark E.; Vélez, Ricardo

doi:10.1007/BF02837562

A survey of recent results on continuous-time Markov decision processes

Published: December 2006

Volume 14, pages 177–261, (2006)
Cite this article

TOP Aims and scope Submit manuscript

Xianping Guo¹,
Onésimo Hernández-Lerma²,
Tomás Prieto-Rumeau³,
Xi-Ren Cao⁴,
Junyu Zhang⁴,
Qiying Hu⁵,
Mark E. Lewis⁶ &
…
Ricardo Vélez⁷

902 Accesses
54 Citations
Explore all metrics

Abstract

This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Albright S.C. and Winston W. (1979). A Birth-Death Model of Advertising and Pricing.Advances in Applied Probability 11, 134–152.
Article Google Scholar
Allen L.J.S. (2003).An Introduction to Stochastic Processes with Applications to Biology. Pearson Education.
Anderson W.J. (1991).Continuous-Time Markov Chains. Springer.
Bailey N.T.J. (1975).The Mathematical Theory of Infectious Diseases and Its Applications. Griffin.
Bartholomew D.J. (1973).Stochastic Models for Social Processes, 2nd Edition. Wiley.
Bellman R. (1957).Dynamic Programming. Princeton University Press.
Berge C. (1963).Topological Spaces. Macmillan.
Bertsekas D.P. (2001).,Dynamic Programming and Optimal Control, Vol. II, 2nd. Edition. Athena Scientific.
Blackwell D. (1962). Discrete Dynamic Programming.Annals of Mathematical Statistics 33, 719–726.
Article Google Scholar
Cao X.R. (2005). Basic Ideals for Event-Based Optimality of Markov, Systems.Discrete Event Dynamic Systems: Theory and Applications 15, 169–197.
Article Google Scholar
Cao X.R. and Guo X.P. (2006). Continuous-Time Markov Decision Processes withn-Potential Optimality Criteria. Preprint.
Cao X.R. and Zhang J.Y. (2007). Then-th Order Bias Optimality for Multi-Chain Markov Decsiion Processes.IEEE Transactions on Automatic Control. In press.
Dekker R. and Hordijk A. (1988). Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards.Mathematics of Operations Research 13, 395–420.
Google Scholar
Dekker R. and Hordijk A. (1992). Recurrence Conditions for Average and Black-well Optimality in Denumerable State Markov Decision Chains.Mathematics of Operations Research 17, 271–289.
Google Scholar
Doshi B.T. (1976). Continuous-Time Control of Markov Processes on an Arbitrary State Space: Discounted Rewards.Annals of Statistics 4, 1219–1235.
Article Google Scholar
Dynkin E.B. and Yushkevich A.A. (1979).Controlled Markov Processes. Springer.
Feinberg E.A. and Shwartz A. (2002).Handbook of Markov Decision Processes. Kluwer.
Feller W. (1940). On the Integro-Differential Equations of Purely Discontinuous Markoff Processes.Transactions of the American Mathematical Society 48, 488–515.
Article Google Scholar
Fisher L. (1968). On the Recurrent Denumerable Decision Process.Annals of Mathematical Statistics 39, 424–434.
Article Google Scholar
Gale D. (1967). On Optimal Development in a Multi-Sector Economy.Review of Economic Studies 34, 1–19.
Article Google Scholar
Guo X.P. (2006). Continuous-Time Markov Decision Processes with Discounted Rewards: The Case of Polish Spaces.Mathematics of Operations Research (to appear).
Guo X.P. and Cao X.R. (2005). Optimal Control of Ergodic Continuous-Time Markov Chains with Average Sample-Path Rewards.SIAM Journal on Control and Optimization 44, 29–48.
Article Google Scholar
Guo X.P. and Hernández-Lerma O. (2003a). Continuous-Time Controlled Markov Chains.Annals of Applied Probability 13, 363–388.
Article Google Scholar
Guo X.P. and Hernández-Lerma O. (2003b). Continuous-Time Controlled Markov Chains with Discounted Rewards.Acta Applicandae Mathematicae 79, 195–216.
Article Google Scholar
Guo X.P. and Hernández-Lerma O. (2003c). Drift and Monotonicity Conditions for Continuous-Time Controlled Markov Chains with an Average Criterion.IEEE Transactions on Automatic Control 48, 236–245.
Article Google Scholar
Guo X.P. and Hernández-Lerma O. (2003d). Zero-Sum Games for Continuous-Time Markov Chains with Unbounded Transition and Average Payoff Rates.Journal of Applied Probability 40, 327–345.
Article Google Scholar
Guo X.P. and Hernández-Lerma O. (2005a). Nonzero-Sum Games for Continuous-Time Markov Chains with Unbounded Discounted Payoffs.Journal of Applied Probability 42, 302–320.
Article Google Scholar
Guo X.P. and Hernández-Lerma O. (2005b). Zero-Sum Continuous-Time Markov Games with Unbounded Transition and Discounted Payoff Rates.Bernoulli 16, 1009–1029.
Google Scholar
Guo X.P. and Liu K. (2001). A Note on Optimality Conditions for Continuous-Time Markov Decision Processes with Average Cost Criterion.IEEE Transactions on Automatic Control 46, 1984–1989.
Article Google Scholar
Guo X.P. and Rieder U. (2006). Average Optimality for Continuous-Time Markov Decision Processes in Polish Spaces.Annals of Applied Probability 16, 730–756.
Article Google Scholar
Guo X.P. and Zhu W.P. (2002a). Denumerable State Continuous-Time Markov Decision Processes with Unbounded Cost and Transition Rates Under the Discounted Criterion.Journal of Applied Probability 39, 233–250.
Article Google Scholar
Guo X.P. and Zhu W.P. (2002b). Denumerable State Continuous-Time Markov Decision Processes with Unbounded Cost and Transition Rates Under Average Criterion.ANZIAM Journal 43, 541–557.
Google Scholar
Haviv M. and Puterman M.L. (1998). Bias Optimality in Controlled Queuing Systems.Journal of Applied Probability 35, 136–150.
Article Google Scholar
Hernández-Lerma O. (1994).Lectures on Continuous-Time Markov Control Processes. Aportaciones Matemáticas, Vol. 3, Sociedad Matemática Mexicana, Mexico City.
Google Scholar
Hernández-Lerma O. and Govindan T.E. (2001). Nonstationary Continuous-Time Markov Control Processes with Discounted Costs on Infinite Horizon.,Acta Applicandae Mathematicae 67, 277–293.
Article Google Scholar
Hernández-Lerma O. and Lasserre J.B. (1996).Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer.
Hernández-Lerma O. and Lasserre J.B. (1999).Further Topics on Discrete-Time Markov Control Processes. Springer.
Hernández-Lerma O. and Romera R. (2004). The Scalarization Approach to Multi-objective Markov Control Problems: Why Does it Work?Applied Mathematics and Optimization 50, 279–293.
Article Google Scholar
Hilgert N. and Hernández-Lerma O. (2003). Bias Optimality Versus Strong 0-Discount Optimality in Markov Control Processes with Unbounded Costs.Acta Applicandae Mathematicae 76, 215–235.
Article Google Scholar
Hordijk A. and Yushkevich A.A. (1999a). Blackwell Optimality in the Class of Stationary Policies in Markov Decision Chains with a Borel State and Unbounded Rewards.Mathematical Methods of Operations Research 49, 1–39.
Google Scholar
Hordijk A. and Yushkevich A.A. (1999b). Blackwell Optimality in the Class of All Policies in Markov Decision Chains with a Borel State and Unbounded Rewards.Mathematical Methods of Operations Research 50, 421–448.
Article Google Scholar
Hordijk A. and Yushkevich A.A. (2002). Blackwell Optimality. In: Feinberg E.A. and Shwartz A. (eds.),Handbook of Markov Decision Processes. Kluwer, 231–267.
Hou Z.T. and Guo X.P. (1998).Markov Decision Processes. Science and Technology Press of Human, Changsha, China. (In Chinese.)
Google Scholar
Howard R.A. (1960).Dynamic Programming and Markov Processes, Wiley.
Hu Q. (1992). Discounted and Average Markov Decision Processes with Unbounded Rewards: New Conditions.Journal of Mathematical Analysis and Applications 171, 111–124.
Article Google Scholar
Hu Q. (1996). Continuous Time Markov Decision Processes with Discounted Moment Criterion.Journal of Mathematical Analysis and Applications 203, 1–12.
Article Google Scholar
Iosifescu M. and Tautu P. (1973).Stochastic Processes and Applications in Biology and Medicine, Vol. II: Models. Springer.
Jaskiewicz A. (2004). On the Equivalence of Two Expected Average Cost Criteria for Semi-Markov Control Processes.Mathematics of Operations Research 29, 326–338.
Article Google Scholar
Kakumanu P. (1971). Continuously Discounted Markov Decision Models with Countable State and Action Spaces.Annals of Mathematical Statistics 42, 919–926.
Article Google Scholar
Kakumanu P. (1972). Nondiscounted Continuous-Time Markov Decision Processes with Countable State and Action Spaces.SIAM Journal on Control 10, 210–220.
Article Google Scholar
Kakumanu P. (1975). Continuous Time Markov Decision Processes with Average Return Criterion.Journal of Mathematical Analysis and Applications 52, 173–188.
Article Google Scholar
Kakumanu P. (1977). Relation Between Continuous and Discrete Markovian Decision Problems.Naval Research Logistics Quarterly 24, 431–439.
Article Google Scholar
Kato T. (1966).Perturbation Theory for Linear Operators. Springer.
Kermack W.O. and McKendrick A.G. (1927). Contributions to the Mathematical Theory of Epidemics.Proceedings of the Royal Statistical Society A115, 700–721.
Google Scholar
Kitayev M.Yu. (1985). Semi-Markov and Jump Markov Controlled Models: Average Cost Criterion.Theory of Probability and its Applications 30, 272–288.
Article Google Scholar
Kitayev M.Yu. and Rykov V.V. (1995).Controlled Queueing Systems. CRC Press.
Lasserre J.B. (1988). Conditions for the Existence of Average and Blackwell Optimal Stationary Policies in Denumerable Markov Decision Processes.Journal of Mathematical Analysis and Applications 136, 479–490.
Article Google Scholar
Lefèvre C. (1979). Optimal Control of the Simple Stochastic Epidemic with Variable Recovery Rates.Mathematical Biosciences 44, 209–219.
Article Google Scholar
Lefèvre C. (1981). Optimal Control of a Birth and Death Epidemic Process.Operations Research 29, 971–982.
Google Scholar
Leizarowitz A. (1996). Overtaking and Almost-Sure Optimality for Infinite Horizon Markov Decision Processes.Mathematics of Operations Research 21, 158–181.
Article Google Scholar
Lembersky M.R. (1974). On Maximal Rewards and ∈-Optimal Policies in Continuous Time Markov Chains.Annals of Statistics 2, 159–169.
Article Google Scholar
Lewis M.E., Ayhan H. and Foley R.D. (1999). Bias Optimality in a Queue with Admission Control.Probability in the Engineering and Informational Sciences 13, 309–327.
Article Google Scholar
Lewis M.E., Ayhan H. and Foley R.D. (2002). Bias Optimal Admission Policies for a Noustationary Multiclass Queueing System.Journal of Applied Probability 39, 20–37.
Article Google Scholar
Lewis M.E. and Puterman M.L. (2001). A Note on Bias Optimality in Controlled Queueing Systems.Journal of Applied Probability 37, 300–305.
Google Scholar
Lewis M.E. and Puterman M.L. (2002). A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes.IEEE Transactions on Automatic Control 46, 96–100.
Article Google Scholar
Lippman S.A. (1975). Applying a New Device in the Optimization of Exponential Queueing Systems.Operations Research 23, 667–710.
Google Scholar
Lund R.B., Meyn S.P. and Tweedie R.L. (1996). Computable Exponential Convergence Rates for Stochastically Ordered Markov Processes.Annals of Applied Probability 6, 218–237.
Article Google Scholar
Mangel M. (1985).Decision and Control in Uncertain Resource Systems. Academic Press.
Massy W.F., Montgomery D.B. and Morrison D.G. (1970).Stochastic Models of Buying Behavior. MIT Press.
Meyn S.P. and Tweedie R.L. (1993). Stability of Markovian Processes III: Foster-Lyapunov Criteria for Continuous-Time Processes.Advances in Applied Probability, 25, 518–548.
Article Google Scholar
Miller B.L. (1968). Finite State Continuous Time Markov Decision Processes with an Infinite Planning Horizon.Journal of Mathematical Analysis and Applications 22, 552–569.
Article Google Scholar
Miller B.L. and Veinott A.F. (1969). Discrete Dynamic Programming with a Small Interest Rate.Annals of Mathematical Statistics 40, 366–370.
Article Google Scholar
Piunovskii A.B. (1998). A Controlled Jump Discounted Model with Constraints.Theory of Probability and Its Applications 42, 51–72.
Article Google Scholar
Piunovskii A.B. (2004). Multicriteria Impulsive Control of Jump Markov Processes.Mathematical Methods of Operations Research 60, 125–144.
Google Scholar
Prieto-Rumeau T. (2006). Blackwell Optimality in the Class of Markov Policies for Continuous-Time Controlled Markov Chains.Acta Applicandae Mathematicae 92, 77–96.
Article Google Scholar
Prieto-Rumeau T. and Hernández-Lerma O. (2005a). The Laurent Series, Sensitive Discount and Blackwell Optimality for Continuous-Time Controlled Markov Chains.Mathematical Methods of Operations Research 61, 123–145.
Article Google Scholar
Prieto-Rumeau T. and Hernández-Lerma O. (2005b). Bias and Overtaking Equilibria for Zero-Sum Continuous-Time Markov Games.Mathematical Methods of Operations Research 61, 437–454.
Article Google Scholar
Prieto-Rumeau T. and Hernández-Lerma O. (2006a). Bias Optimality for Continuous-Time Controlled Markov Chains.SIAM Journal on Control and Optimization 45, 51–73.
Article Google Scholar
Prieto-Rumeau T. and Hernández-Lerma O (2006b). A Unified Approach to Continuous-Time Discounted Markov Control Processes.Morfismos 10 (to appear).
Prieto-Rumeau T. and Hernández-Lerma O. (2006c). Ergodic Control of Continuous-Time Markov Chains with Pathwise Constraints. Preprint.
Prieto-Rumeau T. and Hernández-Lerma O. (2006d). Variance Minimization and the Overtaking Optimality Approach to Continuous-Time Markov Control Chains. Preprint.
Puterman M.L. (1974). Sensitive Discount Optimality in Controlled One-Dimensional Diffusions.Annals of Probability 2, 408–419.
Article Google Scholar
Puterman M.L. (1994).Markov Decision Processes. Wiley.
Qiu Q., Wu Q. and Pedram M. (2001). Stochastic Modeling of a Power-Managed System: Construction and Optimization.IEEE Transactions on Computer Aided Design 20, 1200–1217.
Article Google Scholar
Ramsey F.P. (1928). A Mathematical Theory of Savings.Econometrics Journal 38, 543–559.
Google Scholar
Ross S.M. (1970).Applied Probability Models with Optimization Applications. Holden-Day.
Roykov V.V. (1966). Markov Sequential Decision Processes with Finite State and Decision Space.Theory of Probability and Its Applications, 11, 302–311.
Article Google Scholar
Schäl M. (1992). On the Second Optimality Equation for Semi-Markov Decision Models.Mathematics of Operations Research 17, 470–486.
Article Google Scholar
Sennott L.I. (1999).Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley.
Serfozo R.F. (1979). An Equivalence Between Continuous and Discrete Time Markov Decision Processes.Operations Research 27, 616–620.
Google Scholar
Sladký K. (1978). Sensitive Optimality Criteria for Continuous Time Markov Processes.Transactions of the Eighth Prague Conference on Information Theory Statistical Decision Functions and Random Processes (Prague, 1978), Vol. B, 221–225.
Google Scholar
Song, J.S. (1987). Continuous-Time Markov Decision Programming with Non-Uniformly Bounded Transition Rates.Scientia Sinica 12, 1258–1267. (in Chinese).
Google Scholar
Tadj L and Choudhury G. (2005). Optimal Design and Control of Queues.Top 13, 359–412.
Article Google Scholar
Taylor H.M. (1976). A Laurent Series for the Resolvent of a Strongly Continuous Stochastic Semi-Group.Mathematical Programming Study 6, 258–263.
Google Scholar
Veinott A.F. (1966). On Finding Optimal Policies in Discrete Dynamic Programming with no Discounting.annals of Mathematical Statistics 37, 1284–1294.
Article Google Scholar
Veinott A.F. (1969). Discrete Dynamic Programming with Sensitive Discount Optimality Criteria.Annals of Mathematical Statistics 40, 1635–1660.
Article Google Scholar
Vidale M.L and Wolfe H.B. (1957). An Operations Research Study of Sales Response to Advertising.Operations Research 5, 370–381.
Google Scholar
von Weizsäcker C.C. (1965). Existence of Optimal Programs of Accumulation for an Infinite Horizon.Review of Economic Studies 32, 85–104.
Article Google Scholar
Wickwire K. (1977). Mathematical Models for the Control of Pests and Infectious Diseases: A Survey.Theoretical Population Biology 11, 182–238.
Article Google Scholar
Wu C.B. (1997). Continuous Time Markov Decision Processes with Unbounded Reward and Non-Uniformly Bounded Transition Rate Under Discounted Criterion.Acta Mathematicae Applicandae Sinica 20, 196–208.
Google Scholar
Ye L., Guo X.P. and Hernández-Lerma O. (2006). Existence and Regularity of NonhomogeneousQ(t)-Processes under Measurability Conditions. Preprint.
Yosida K. (1980).Functional Analysis, Sixth Edition. Springer.
Yushkevich A.A. (1973). On a Class of Strategies in General Markov Decision Models.Theory of Probability and Its Applications 18, 777–779.
Article Google Scholar
Yushkevich A.A. (1977). Controlled Markov Models with Countable State and Continuous Time.Theory of Probability and its Applications 22, 215–235.
Article Google Scholar
Yushkevich A.A. (1994). Blackwell Optimal Policies in a Markov Decision Process with a Borel State Space.Mathematical Methods of Operations Research 40, 253–288.
Article Google Scholar
Yushkevich A.A. (1997). Blackwell Optimality in Continuous in Action Markov Decision Processes.SIAM Journal on Control and Optimization 35, 2157–2182.
Article Google Scholar
Yushkevich A.A. and Feinberg E.A. (1979). On Homogeneous Markov Model with Continuous Time and Finite or Countable State Space.Theory of Probability and its Applications 24, 156–161.
Article Google Scholar

References

Bather J. (1976). Optimal Stationary Policies for Denumerable Markov Chains in Continuous Time.Advances in Applied Probability 8, 148–155.
Google Scholar
Cao X.-R. (2003a). Semi-Markov Decision Problems and Performance Sensitivity Analysis.IEEE Transactions on Automatic Control 48, 758–769.
Article Google Scholar
Cao X.-R. (2003b) A Sensitivity View of Markov Decision Processes and Reinforcement Learning. In: Gong W. and Shi L. (eds.),Modeling, Control and Optimization of Complex systems, Kluwer, 261–283.
Cao X.-R. (2003c). From Perturbation Analysis to Markov Decision Processes and Reinforcement Learning.Discrete Event Dynamic Systems 13, 9–39.
Article Google Scholar
Cao X.-R. (2004). The Potential Structure of Sample Paths and Performance Sensitivities of Markov Systems.IEEE Transactions on Automatic Control 49, 2129–2142.
Article Google Scholar
Cao X.-R. (2005). Basic Ideas for Event-Based Optimality of Markov Systems.Discrete Event Dynamic Systems: Theory and Applications 15, 169–197.
Article Google Scholar
Cao X.-R. and Zhang J.Y. (2007). Thenth-Order Bias Optimality for Multichain Markov Decision Process.IEEE Transactions on Automatic Control (to appear).
Dijk N.V. (1993).Queueing Networks and Product Forms: A System Approach. Wiley.
Dynkin E.B. and Yushkevich A.A. (1979).Controlled markov Processes. Springer.
Puterman M.L. (1994).Markov Decision Processes. Wiley.
Sennott L.I. (1999).Stochastic Dynamic Programming and the Control of Queueing Systems. Wiley.

References

Hu Q. (1990). CTMDP and Its Relationship with DTMDP.Chinese Science Bulletin 35, 710–714.
Google Scholar
Hu Q. (1992). Discounted and Average Markov Decision Processes with Unbounded Rewards: New Conditions.Journal of Mathematical Analysis and Applications 171, 111–124.
Article Google Scholar
Hu Q. (1996). Continuous Time Markov Decision Processes with Discounted Moment Criterion.Journal of Mathematical Analysis and Applications 203, 1–12.
Article Google Scholar
Hu Q., Liu J. and Yue W. (2003). Continuous Time Markov Decision Processes: Discounted Total Reward.International Journal of Pure and Applied Mathematics 7, 147–175.
Google Scholar
Hu Q. and Wang J. (1998). Continuous Time Markov Decision Processes with Nonuniformly Bounded Rate: Expected Total Rewards.Optimization 43, 219–233.
Article Google Scholar
Serfozo R.F. (1979). An Equivalence Between Continuous and Discrete Time Markov Decision Processes.Operations Research 27, 616–620.
Google Scholar

References

Haviv M. and Puterman M.L. (1998). Bias Optimality in Controlled Queuing Systems.Journal of Applied Probability 35, 136–150.
Article Google Scholar
Lewis M.E., Ayhan H. and Foley R.D. (1999). Bias Optimality in a Queue with Admission Control.Probability in the Engineering and Informational Sciences 13, 309–327.
Article Google Scholar
Lewis M.E., Ayhan H. and Foley R.D. (2002). Bias Optimal Admission Policies for a Nonstationary Multiclass Queueing System.Journal of Applied Probability 39, 20–37.
Article Google Scholar

References

Borkar V.S. (2004). Controlled Diffusion Processes.Probability Surveys 2, 213–244.
Google Scholar
Cao X.R. and Guo X.P. (2004). Partially Observable Markov Decision Processes With Reward Information Proceeding of 43rd IEEE Conference on Decision and Control, 4393–4398.
Hernández-Lerma O. (1989).Adaptive Markov Control Processes. Springer.
Hernández-Lerma O. and Lasserre J.B. (1996).Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer.
Hernández-Lerma O. and Lasserre J.B. (1999).Further Topics on Discrete-Time Markov Control Processes. Springer.
Howard R.A. (1960).Dynamic Programming and Markov Processes. MIT Press.
Kaelbling L.P., Littman M.L. and Moore A.W. (1996). Reinforcement Learning: A Survey,Journal of Artificial Intelligence Research 4, 237–285.
Google Scholar
Puterman M.L. (1994).Markov Decision Processes. Wiley.
Neyman A. and Sorin S. (2001).Stochastic Games and Applications. NATO Science Series, 570.

Download references

Author information

Authors and Affiliations

Zhongshan University, P.R. China
Xianping Guo
CINVESTAV-IPN, Mexico
Onésimo Hernández-Lerma
Universidad Nacional de Educación a Distancia, Spain
Tomás Prieto-Rumeau
Hong Kong University of Science and Technology, Hong Kong
Xi-Ren Cao & Junyu Zhang
Shanghai University, China
Qiying Hu
Cornell University, USA
Mark E. Lewis
Universidad Nacinal de Educación a Distancia, Spain
Ricardo Vélez

Authors

Xianping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Onésimo Hernández-Lerma
View author publications
You can also search for this author in PubMed Google Scholar
Tomás Prieto-Rumeau
View author publications
You can also search for this author in PubMed Google Scholar
Xi-Ren Cao
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiying Hu
View author publications
You can also search for this author in PubMed Google Scholar
Mark E. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Vélez
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Research partially supported by grants NSFC, DRFP and NCET.

Research partially supported by CONACyT (Mexico) Grant 45693-F.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, X., Hernández-Lerma, O., Prieto-Rumeau, T. et al. A survey of recent results on continuous-time Markov decision processes. TOP 14, 177–261 (2006). https://doi.org/10.1007/BF02837562

Download citation

Issue Date: December 2006
DOI: https://doi.org/10.1007/BF02837562

Key Words

AMS subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of recent results on continuous-time Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Reduction of Discounted Continuous-Time MDPs with Unbounded Jump and Reward Rates to Discrete-Time Total-Reward MDPs

Long-Run Rewards for Markov Automata

References

References

References

References

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key Words

AMS subject classification

Navigation

A survey of recent results on continuous-time Markov decision processes

Abstract

Access this article

Similar content being viewed by others

Markov Decision Processes with Functional Rewards

Reduction of Discounted Continuous-Time MDPs with Unbounded Jump and Reward Rates to Discrete-Time Total-Reward MDPs

Long-Run Rewards for Markov Automata

References

References

References

References

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key Words

AMS subject classification

Search

Navigation