Dynamic priority allocation via restless bandit marginal productivity indices

Niño-Mora, José

doi:10.1007/s11750-007-0025-0

Dynamic priority allocation via restless bandit marginal productivity indices

Invited Paper
Published: 27 September 2007

Volume 15, pages 161–198, (2007)
Cite this article

TOP Aims and scope Submit manuscript

José Niño-Mora¹

Abstract

This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides building on the rich literature on bandit problems, our approach draws on ideas from linear programming, economics, and multi-objective optimization. In particular, it was motivated to address issues raised in the seminal work of Whittle (Restless bandits: activity allocation in a changing world. In: Gani J. (ed.) A Celebration of Applied Probability, J. Appl. Probab., vol. 25A, Applied Probability Trust, Sheffield, pp. 287–298, 1988) where he introduced the index for restless bandits that is the starting point of this work. Such an index, along with previously proposed indices and more recent extensions, is shown to be unified through the intuitive concept of “marginal productivity index” (MPI), which measures the marginal productivity of work on a project at each of its states. In a multi-project setting, MPI policies are economically sound, as they dynamically allocate higher priority to those projects where work appears to be currently more productive. Besides being tractable and widely applicable, a growing body of computational evidence indicates that such index policies typically achieve a near-optimal performance and substantially outperform benchmark policies derived from conventional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adan IJBF, van der Wal J (1998) Combining make to order and make to stock. OR Spektrum 20:73–81
Article Google Scholar
Altman E, Shwartz A (1989) Optimal priority assignment: a time sharing approach. IEEE T Autom Contr 34:1098–1102
Article Google Scholar
Altman E, Stidham S Jr (1995) Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Syst 21:267–291
Article Google Scholar
Asawa M, Teneketzis D (1996) Multi-armed bandits with switching penalties. IEEE T Autom Contr 41:328–348
Article Google Scholar
Bambos N, Michailidis G (2002) On parallel queuing with random server connectivity and routing constraints. Probab Eng Inform Sc 16:185–203
Article Google Scholar
Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62:687–694
Article Google Scholar
Bellman R (1956) A problem in the sequential design of experiments. Sankhyā 16:221–229
Google Scholar
Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math Oper Res 21:257–306
Google Scholar
Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33:719–726
Google Scholar
Boxma OJ (1995) Static optimization of queueing systems. In: Agarwal RP (ed) Recent trends in optimization theory and applications. World sci ser appl anal, vol 5. World Sci Publ, River Edge, pp 1–16
Google Scholar
Bradt RN, Johnson SM, Karlin S (1956) On sequential designs for maximizing the sum of n observations. Ann Math Stat 27:1060–1074
Google Scholar
Chen H, Yao DD (1990) Optimal intensity control of a queueing system with state-dependent capacity limit. IEEE T Autom Contr 35:459–464
Article Google Scholar
Clark JB (2005) The distribution of wealth: a theory of wages, interest and profits. Adamant media corporation (reissued from the edition published in 1908 by the Macmillan Company, New York, NY; first published: 1899)
Coffman EG Jr, Mitrani I (1980) A characterization of waiting time performance realizable by single-server queues. Oper Res 28:810–821
Google Scholar
Combé MB, Boxma OJ (1994) Optimization of static traffic allocation policies. Theor Comput Sci 125:17–43
Google Scholar
Cox DR, Smith WL (1961) Queues. Methuen, London
Google Scholar
De Véricourt F, Karaesmen F, Dallery Y (2000) Dynamic scheduling in a make-to-stock system: a partial characterization of optimal policies. Oper Res 48:811–819
Article Google Scholar
Derman C, Lieberman GJ, Ross SM (1980) On the optimal assignment of servers and a repairman. J Appl Probab 17:577–581
Article Google Scholar
Dongarra JJ, Eijkhout V (2000) Numerical linear algebra algorithms and software. J Comput Appl Math 123:489–514
Article Google Scholar
Dusonchet F, Hongler MO (2003a) Continuous-time restless bandit and dynamic scheduling for make-to-stock production. IEEE T Robotic Autom 19:977–990
Article Google Scholar
Dusonchet F, Hongler MO (2003b) Optimal hysteresis for a class of deterministic deteriorating two-armed bandit problem with switching costs. Automatica 39:1947–1955
Article Google Scholar
Ehsan N, Liu MY (2006) Optimal bandwidth allocation in a delay channel. IEEE J Sel Area Commun 24:1614–1626
Article Google Scholar
Feinberg EA, Shwartz A (2002) Mixed criteria. In: Feinberg EA, Shwartz A (eds) Handbook of Markov decision processes: methods and applications. Kluwer, Boston, pp 209–230
Google Scholar
Federgruen A, Groenevelt H (1988) Characterization and optimization of achievable performance in general queueing systems. Oper Res 36:733–741
Google Scholar
Gass S, Saaty T (1955) The computational algorithm for the parametric objective function. Naval Res Log 2:39–46
Article Google Scholar
Gittins JC (1979) Bandit processes and dynamic allocation indices (with discussion). J R Stat Soc B Met 41:148–177
Google Scholar
Gittins JC (1989) Multi-armed bandit allocation indices. Wiley, Chichester
Google Scholar
Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. In: Gani J, Sarkadi K, Vincze I (eds) Progress in statistics. European meeting of statisticians, Budapest, 1972. North-Holland, Amsterdam, pp 241–266
Google Scholar
Haji R, Newell GF (1971) A relation between stationary queue and waiting-time distributions. J Appl Probab 8:617–620
Article Google Scholar
Hernández-Lerma O, Hoyos-Reyes LF (2001) A multiobjective control approach to priority queues. Math Method Oper Res 53:265–277
Article Google Scholar
Hordijk A, Koole G (1990) On the optimality of the generalized shortest queue policy. Probab Eng Inform Sci 4:477–487
Google Scholar
Johri PK (1989) Optimality of the shortest line discipline with state-dependent service rates. Eur J Oper Res 41:157–161
Article Google Scholar
Jun T (2004) A survey on the bandit problem with switching costs. De Economist 152:513–541
Article Google Scholar
Kallmes MH, Cassandras CG (1995) Two approaches to optimal routing and admission control in systems with real-time traffic. J Optim Theory Appl 84:311–338
Article Google Scholar
Kantorovich LV (1959) Ekonomicheskii raschet nailuchshego ispolzovania resursov. Academy of Sciences, USSR (Engl trans: The best use of economic resources, Harvard University Press, Cambridge, 1965)
Google Scholar
Keilson J, Servi LD (1988) A distributional form of Little’s law. Oper Res Lett 7:223–227
Article Google Scholar
Kelly FP (1981) Multi-armed bandits with discount factor near one: the Bernoulli case. Ann Stat 9:987–1001
Google Scholar
Kim E, Van Oyen MP (1998) Beyond the c μ rule: dynamic scheduling of a two-class loss queue. Math Method Oper Res 48:17–36
Article Google Scholar
Kleinrock L (1965) A conservation law for a wide class of queueing disciplines. Naval Res Log 12:181–192
Article Google Scholar
Kleinrock L (1975) Queueing systems, vol I: Theory. Wiley, New York
Google Scholar
Klimov GP (1974) Time-sharing service systems. I. Theory Probab Appl 19:532–551
Article Google Scholar
Koopmans TC (1957) Three essays on the state of economic science. McGraw–Hill, New York, chap Allocation of resources and the price system
Google Scholar
Krishnan KR (1988) State-dependent allocation of job stream to parallel workstations. In: Proceedings of the 27th conference on decision and control. IEEE, Piscataway, pp 2332–2333
Chapter Google Scholar
Krishnan KR (1990) Joining the right queue: a state-dependent decision rule. IEEE T Autom Contr 35:104–108
Article Google Scholar
Lewis ME, Puterman ML (2002) Bias optimality. In: Feinberg EA, Schwartz A (ed) Handbook of Markov decision processes. Kluwer Academic, Boston, pp 89–111
Google Scholar
Lott C, Teneketzis D (2000) On the optimality of an index rule in multichannel allocation for single-hop mobile networks with multiple service classes. Probab Eng Inform Sc 14:259–297
Article Google Scholar
Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv Appl Probab 33:76–98
Article Google Scholar
Niño-Mora J (2002) Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Math Program 93:361–413
Article Google Scholar
Niño-Mora J (2003) Restless bandit marginal productivity indices, diminishing returns, and scheduling a multiclass make-to-order/-stock queue. In: Proceedings of the 41st annual Allerton conference on communication, control and computing. Univ of Illinois at Urbana-Champaign, Monticello, pp 100–109
Google Scholar
Niño-Mora J (2005) A marginal productivity index policy for the finite-horizon multiarmed bandit problem. In: Proceedings of the 44th IEEE conference on decision and control and European control conference ECC 2005. IEEE, Piscataway, pp 1718–1722
Chapter Google Scholar
Niño-Mora J (2006a) Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math Oper Res 31:50–84
Article Google Scholar
Niño-Mora J (2006b) Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue. Queueing Syst 54:281–312
Article Google Scholar
Niño-Mora J (2006c) Marginal productivity index policies for scheduling multiclass wireless transmissions. In: NGI 2006, proceedings of the 2nd EuroNGI conference on next generation Internet networks—design and engineering. IEEE, Piscataway, pp 342–349
Chapter Google Scholar
Niño-Mora J (2007a) A (2/3)n ³ fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain. INFORMS J Comput, vol 19, DOI: 10.1287/ijoc.1060.0206
Niño-Mora J (2007b) Marginal productivity index policies for admission control and routing to parallel multi-server loss queues with reneging. In: Network control and optimization, proceedings of the first EuroFGI international conference NET-COOP 2007. Lect Notes Comput Sc, vol 4465. Springer, Berlin, pp 138–149
Google Scholar
Niño-Mora J (2007c) Marginal productivity index policies for scheduling multiclass delay-/loss-sensitive traffic with delayed state observation. In: NGI 2007, proceedings of the 3rd EuroNGI conference on next generation Internet networks—design and engineering for heterogeneity. IEEE, Piscataway, pp 209–217
Google Scholar
Niño-Mora J (2007d) A faster index algorithm and a computational study for bandits with switching costs. INFORMS J Comput (in press)
Niño-Mora J (2007e) Characterization and computation of restless bandit marginal productivity indices. In: SMCtools’07, proceedings from the 2007 workshop on tools for solving structured Markov chains. ACM, New York
Google Scholar
Niño-Mora J (2007f) Computing an index policy for bandits with switching penalties. In: SMCtools’07, proceedings from the 2007 workshop on tools for solving structured Markov chains. ACM, New York
Google Scholar
Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math Oper Res 24:293–305
Google Scholar
Peña-Pérez A, Zipkin P (1997) Dynamic scheduling rules for a multiproduct make-to-stock queue. Oper Res 45:919–930
Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Google Scholar
Rothkopf MH (1966) Scheduling with random service times. Manag Sci 12:707–713
Article Google Scholar
Sevcik KC (1974) Scheduling for minimum total loss using service time distributions. J ACM 21:66–75
Article Google Scholar
Shanthikumar JG, Yao DD (1992) Multiclass queueing systems: polymatroidal structure and optimal scheduling control. Oper Res 40:S293–S299
Google Scholar
Smith WE (1956) Various optimizers for single-stage production. Naval Res Log 3:59–66
Article Google Scholar
Soman CA, Van Donk DP, Gaalman G (2004) Combined make-to-order and make-to-stock in a food production system. Int J Prod Econ 90:223–235
Article Google Scholar
Sparaggis PD, Cassandras CG, Towsley D (1993) On the duality between routing and scheduling systems with finite buffer space. IEEE T Autom Contr 38:1440–1446
Article Google Scholar
Stidham S Jr (1985) Optimal control of admission to a queueing system. IEEE T Autom Contr 30:705–713
Article Google Scholar
Tassiulas L, Ephremides A (1993) Dynamic server allocation to parallel queues with randomly varying connectivity. IEEE T Inform Theory 39:466–478
Article Google Scholar
Tsoucas P (1991) The region of achievable performance in a model of Klimov. Tech Rep RC16543, IBM TJ Watson Research Center, Yorktown Heights, New York, NY
Varaiya PP, Walrand JC, Buyukkoc C (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE T Autom Contr 30:426–439
Article Google Scholar
Veatch MH, Wein LM (1996) Scheduling a multiclass make-to-stock queue: index policies and hedging points. Oper Res 44:634–647
Article Google Scholar
Wang YG (1997) Error bounds for calculation of the Gittins indices. Aust J Stat 39:225–233
Article Google Scholar
Wasserman KM, Bambos N (1996) Optimal server allocation to parallel queues with finite-capacity buffers. Probab Eng Inform Sc 10:279–285
Article Google Scholar
Weber RR (1992) On the Gittins index for multiarmed bandits. Ann Appl Probab 2:1024–1033
Google Scholar
Weber RR, Weiss G (1990) On an index policy for restless bandits. J Appl Probab 27:637–648
Article Google Scholar
Weber RR, Weiss G (1991) Addendum to “On an index policy for restless bandits”. Adv Appl Probab 23:429–430
Article Google Scholar
Wein LM (1992) Dynamic scheduling of a multiclass make-to-stock queue. Oper Res 40:724–735
Google Scholar
Whittle P (1980) Multi-armed bandits and the Gittins index. J R Stat Soc B Met 42:143–149
Google Scholar
Whittle P (1988) Restless bandits: activity allocation in a changing world. In: Gani J (ed) A celebration of applied probability. J appl probab, vol 25A. Applied Probability Trust, Sheffield, pp 287–298
Google Scholar
Whittle P (1996) Optimal control: basics and beyond. Wiley, Chichester
Google Scholar
Winston W (1977) Optimality of the shortest line discipline. J Appl Probab 14:181–189
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Universidad Carlos III de Madrid, C/ Madrid 126, 28903, Getafe, Madrid, Spain
José Niño-Mora

Authors

José Niño-Mora
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Niño-Mora.

Additional information

This invited paper is discussed in the comments available at: http://dx.doi.org/10.1007/s11750-007-0026-z, http://dx.doi.org/10.1007/s11750-007-0027-y, http://dx.doi.org/10.1007/s11750-007-0028-x, http://dx.doi.org/10.1007/s11750-007-0029-9, http://dx.doi.org/10.1007/s11750-007-0030-3, http://dx.doi.org/10.1007/s11750-007-0031-2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niño-Mora, J. Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198 (2007). https://doi.org/10.1007/s11750-007-0025-0

Download citation

Published: 27 September 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s11750-007-0025-0

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic priority allocation via restless bandit marginal productivity indices

Abstract

Access this article

Similar content being viewed by others

Discrete Choice Experiments: A Guide to Model Specification, Estimation and Software

Partial Least Squares Structural Equation Modeling

Using propensity scores in difference-in-differences models to estimate the effects of a policy change

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Dynamic priority allocation via restless bandit marginal productivity indices

Abstract

Access this article

Similar content being viewed by others

Discrete Choice Experiments: A Guide to Model Specification, Estimation and Software

Partial Least Squares Structural Equation Modeling

Using propensity scores in difference-in-differences models to estimate the effects of a policy change

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation