Skip to main content
Log in

Dynamic priority allocation via restless bandit marginal productivity indices

  • Invited Paper
  • Published:
TOP Aims and scope Submit manuscript

Abstract

This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides building on the rich literature on bandit problems, our approach draws on ideas from linear programming, economics, and multi-objective optimization. In particular, it was motivated to address issues raised in the seminal work of Whittle (Restless bandits: activity allocation in a changing world. In: Gani J. (ed.) A Celebration of Applied Probability, J. Appl. Probab., vol. 25A, Applied Probability Trust, Sheffield, pp. 287–298, 1988) where he introduced the index for restless bandits that is the starting point of this work. Such an index, along with previously proposed indices and more recent extensions, is shown to be unified through the intuitive concept of “marginal productivity index” (MPI), which measures the marginal productivity of work on a project at each of its states. In a multi-project setting, MPI policies are economically sound, as they dynamically allocate higher priority to those projects where work appears to be currently more productive. Besides being tractable and widely applicable, a growing body of computational evidence indicates that such index policies typically achieve a near-optimal performance and substantially outperform benchmark policies derived from conventional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adan IJBF, van der Wal J (1998) Combining make to order and make to stock. OR Spektrum 20:73–81

    Article  Google Scholar 

  • Altman E, Shwartz A (1989) Optimal priority assignment: a time sharing approach. IEEE T Autom Contr 34:1098–1102

    Article  Google Scholar 

  • Altman E, Stidham S Jr (1995) Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Syst 21:267–291

    Article  Google Scholar 

  • Asawa M, Teneketzis D (1996) Multi-armed bandits with switching penalties. IEEE T Autom Contr 41:328–348

    Article  Google Scholar 

  • Bambos N, Michailidis G (2002) On parallel queuing with random server connectivity and routing constraints. Probab Eng Inform Sc 16:185–203

    Article  Google Scholar 

  • Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62:687–694

    Article  Google Scholar 

  • Bellman R (1956) A problem in the sequential design of experiments. Sankhyā 16:221–229

    Google Scholar 

  • Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math Oper Res 21:257–306

    Google Scholar 

  • Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33:719–726

    Google Scholar 

  • Boxma OJ (1995) Static optimization of queueing systems. In: Agarwal RP (ed) Recent trends in optimization theory and applications. World sci ser appl anal, vol 5. World Sci Publ, River Edge, pp 1–16

    Google Scholar 

  • Bradt RN, Johnson SM, Karlin S (1956) On sequential designs for maximizing the sum of n observations. Ann Math Stat 27:1060–1074

    Google Scholar 

  • Chen H, Yao DD (1990) Optimal intensity control of a queueing system with state-dependent capacity limit. IEEE T Autom Contr 35:459–464

    Article  Google Scholar 

  • Clark JB (2005) The distribution of wealth: a theory of wages, interest and profits. Adamant media corporation (reissued from the edition published in 1908 by the Macmillan Company, New York, NY; first published: 1899)

  • Coffman EG Jr, Mitrani I (1980) A characterization of waiting time performance realizable by single-server queues. Oper Res 28:810–821

    Google Scholar 

  • Combé MB, Boxma OJ (1994) Optimization of static traffic allocation policies. Theor Comput Sci 125:17–43

    Google Scholar 

  • Cox DR, Smith WL (1961) Queues. Methuen, London

    Google Scholar 

  • De Véricourt F, Karaesmen F, Dallery Y (2000) Dynamic scheduling in a make-to-stock system: a partial characterization of optimal policies. Oper Res 48:811–819

    Article  Google Scholar 

  • Derman C, Lieberman GJ, Ross SM (1980) On the optimal assignment of servers and a repairman. J Appl Probab 17:577–581

    Article  Google Scholar 

  • Dongarra JJ, Eijkhout V (2000) Numerical linear algebra algorithms and software. J Comput Appl Math 123:489–514

    Article  Google Scholar 

  • Dusonchet F, Hongler MO (2003a) Continuous-time restless bandit and dynamic scheduling for make-to-stock production. IEEE T Robotic Autom 19:977–990

    Article  Google Scholar 

  • Dusonchet F, Hongler MO (2003b) Optimal hysteresis for a class of deterministic deteriorating two-armed bandit problem with switching costs. Automatica 39:1947–1955

    Article  Google Scholar 

  • Ehsan N, Liu MY (2006) Optimal bandwidth allocation in a delay channel. IEEE J Sel Area Commun 24:1614–1626

    Article  Google Scholar 

  • Feinberg EA, Shwartz A (2002) Mixed criteria. In: Feinberg EA, Shwartz A (eds) Handbook of Markov decision processes: methods and applications. Kluwer, Boston, pp 209–230

    Google Scholar 

  • Federgruen A, Groenevelt H (1988) Characterization and optimization of achievable performance in general queueing systems. Oper Res 36:733–741

    Google Scholar 

  • Gass S, Saaty T (1955) The computational algorithm for the parametric objective function. Naval Res Log 2:39–46

    Article  Google Scholar 

  • Gittins JC (1979) Bandit processes and dynamic allocation indices (with discussion). J R Stat Soc B Met 41:148–177

    Google Scholar 

  • Gittins JC (1989) Multi-armed bandit allocation indices. Wiley, Chichester

    Google Scholar 

  • Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. In: Gani J, Sarkadi K, Vincze I (eds) Progress in statistics. European meeting of statisticians, Budapest, 1972. North-Holland, Amsterdam, pp 241–266

    Google Scholar 

  • Haji R, Newell GF (1971) A relation between stationary queue and waiting-time distributions. J Appl Probab 8:617–620

    Article  Google Scholar 

  • Hernández-Lerma O, Hoyos-Reyes LF (2001) A multiobjective control approach to priority queues. Math Method Oper Res 53:265–277

    Article  Google Scholar 

  • Hordijk A, Koole G (1990) On the optimality of the generalized shortest queue policy. Probab Eng Inform Sci 4:477–487

    Google Scholar 

  • Johri PK (1989) Optimality of the shortest line discipline with state-dependent service rates. Eur J Oper Res 41:157–161

    Article  Google Scholar 

  • Jun T (2004) A survey on the bandit problem with switching costs. De Economist 152:513–541

    Article  Google Scholar 

  • Kallmes MH, Cassandras CG (1995) Two approaches to optimal routing and admission control in systems with real-time traffic. J Optim Theory Appl 84:311–338

    Article  Google Scholar 

  • Kantorovich LV (1959) Ekonomicheskii raschet nailuchshego ispolzovania resursov. Academy of Sciences, USSR (Engl trans: The best use of economic resources, Harvard University Press, Cambridge, 1965)

    Google Scholar 

  • Keilson J, Servi LD (1988) A distributional form of Little’s law. Oper Res Lett 7:223–227

    Article  Google Scholar 

  • Kelly FP (1981) Multi-armed bandits with discount factor near one: the Bernoulli case. Ann Stat 9:987–1001

    Google Scholar 

  • Kim E, Van Oyen MP (1998) Beyond the c μ rule: dynamic scheduling of a two-class loss queue. Math Method Oper Res 48:17–36

    Article  Google Scholar 

  • Kleinrock L (1965) A conservation law for a wide class of queueing disciplines. Naval Res Log 12:181–192

    Article  Google Scholar 

  • Kleinrock L (1975) Queueing systems, vol I: Theory. Wiley, New York

    Google Scholar 

  • Klimov GP (1974) Time-sharing service systems. I. Theory Probab Appl 19:532–551

    Article  Google Scholar 

  • Koopmans TC (1957) Three essays on the state of economic science. McGraw–Hill, New York, chap Allocation of resources and the price system

    Google Scholar 

  • Krishnan KR (1988) State-dependent allocation of job stream to parallel workstations. In: Proceedings of the 27th conference on decision and control. IEEE, Piscataway, pp 2332–2333

    Chapter  Google Scholar 

  • Krishnan KR (1990) Joining the right queue: a state-dependent decision rule. IEEE T Autom Contr 35:104–108

    Article  Google Scholar 

  • Lewis ME, Puterman ML (2002) Bias optimality. In: Feinberg EA, Schwartz A (ed) Handbook of Markov decision processes. Kluwer Academic, Boston, pp 89–111

    Google Scholar 

  • Lott C, Teneketzis D (2000) On the optimality of an index rule in multichannel allocation for single-hop mobile networks with multiple service classes. Probab Eng Inform Sc 14:259–297

    Article  Google Scholar 

  • Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv Appl Probab 33:76–98

    Article  Google Scholar 

  • Niño-Mora J (2002) Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Math Program 93:361–413

    Article  Google Scholar 

  • Niño-Mora J (2003) Restless bandit marginal productivity indices, diminishing returns, and scheduling a multiclass make-to-order/-stock queue. In: Proceedings of the 41st annual Allerton conference on communication, control and computing. Univ of Illinois at Urbana-Champaign, Monticello, pp 100–109

    Google Scholar 

  • Niño-Mora J (2005) A marginal productivity index policy for the finite-horizon multiarmed bandit problem. In: Proceedings of the 44th IEEE conference on decision and control and European control conference ECC 2005. IEEE, Piscataway, pp 1718–1722

    Chapter  Google Scholar 

  • Niño-Mora J (2006a) Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math Oper Res 31:50–84

    Article  Google Scholar 

  • Niño-Mora J (2006b) Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue. Queueing Syst 54:281–312

    Article  Google Scholar 

  • Niño-Mora J (2006c) Marginal productivity index policies for scheduling multiclass wireless transmissions. In: NGI 2006, proceedings of the 2nd EuroNGI conference on next generation Internet networks—design and engineering. IEEE, Piscataway, pp 342–349

    Chapter  Google Scholar 

  • Niño-Mora J (2007a) A (2/3)n 3 fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain. INFORMS J Comput, vol 19, DOI: 10.1287/ijoc.1060.0206

  • Niño-Mora J (2007b) Marginal productivity index policies for admission control and routing to parallel multi-server loss queues with reneging. In: Network control and optimization, proceedings of the first EuroFGI international conference NET-COOP 2007. Lect Notes Comput Sc, vol 4465. Springer, Berlin, pp 138–149

    Google Scholar 

  • Niño-Mora J (2007c) Marginal productivity index policies for scheduling multiclass delay-/loss-sensitive traffic with delayed state observation. In: NGI 2007, proceedings of the 3rd EuroNGI conference on next generation Internet networks—design and engineering for heterogeneity. IEEE, Piscataway, pp 209–217

    Google Scholar 

  • Niño-Mora J (2007d) A faster index algorithm and a computational study for bandits with switching costs. INFORMS J Comput (in press)

  • Niño-Mora J (2007e) Characterization and computation of restless bandit marginal productivity indices. In: SMCtools’07, proceedings from the 2007 workshop on tools for solving structured Markov chains. ACM, New York

    Google Scholar 

  • Niño-Mora J (2007f) Computing an index policy for bandits with switching penalties. In: SMCtools’07, proceedings from the 2007 workshop on tools for solving structured Markov chains. ACM, New York

    Google Scholar 

  • Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math Oper Res 24:293–305

    Google Scholar 

  • Peña-Pérez A, Zipkin P (1997) Dynamic scheduling rules for a multiproduct make-to-stock queue. Oper Res 45:919–930

    Google Scholar 

  • Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

    Google Scholar 

  • Rothkopf MH (1966) Scheduling with random service times. Manag Sci 12:707–713

    Article  Google Scholar 

  • Sevcik KC (1974) Scheduling for minimum total loss using service time distributions. J ACM 21:66–75

    Article  Google Scholar 

  • Shanthikumar JG, Yao DD (1992) Multiclass queueing systems: polymatroidal structure and optimal scheduling control. Oper Res 40:S293–S299

    Google Scholar 

  • Smith WE (1956) Various optimizers for single-stage production. Naval Res Log 3:59–66

    Article  Google Scholar 

  • Soman CA, Van Donk DP, Gaalman G (2004) Combined make-to-order and make-to-stock in a food production system. Int J Prod Econ 90:223–235

    Article  Google Scholar 

  • Sparaggis PD, Cassandras CG, Towsley D (1993) On the duality between routing and scheduling systems with finite buffer space. IEEE T Autom Contr 38:1440–1446

    Article  Google Scholar 

  • Stidham S Jr (1985) Optimal control of admission to a queueing system. IEEE T Autom Contr 30:705–713

    Article  Google Scholar 

  • Tassiulas L, Ephremides A (1993) Dynamic server allocation to parallel queues with randomly varying connectivity. IEEE T Inform Theory 39:466–478

    Article  Google Scholar 

  • Tsoucas P (1991) The region of achievable performance in a model of Klimov. Tech Rep RC16543, IBM TJ Watson Research Center, Yorktown Heights, New York, NY

  • Varaiya PP, Walrand JC, Buyukkoc C (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE T Autom Contr 30:426–439

    Article  Google Scholar 

  • Veatch MH, Wein LM (1996) Scheduling a multiclass make-to-stock queue: index policies and hedging points. Oper Res 44:634–647

    Article  Google Scholar 

  • Wang YG (1997) Error bounds for calculation of the Gittins indices. Aust J Stat 39:225–233

    Article  Google Scholar 

  • Wasserman KM, Bambos N (1996) Optimal server allocation to parallel queues with finite-capacity buffers. Probab Eng Inform Sc 10:279–285

    Article  Google Scholar 

  • Weber RR (1992) On the Gittins index for multiarmed bandits. Ann Appl Probab 2:1024–1033

    Google Scholar 

  • Weber RR, Weiss G (1990) On an index policy for restless bandits. J Appl Probab 27:637–648

    Article  Google Scholar 

  • Weber RR, Weiss G (1991) Addendum to “On an index policy for restless bandits”. Adv Appl Probab 23:429–430

    Article  Google Scholar 

  • Wein LM (1992) Dynamic scheduling of a multiclass make-to-stock queue. Oper Res 40:724–735

    Google Scholar 

  • Whittle P (1980) Multi-armed bandits and the Gittins index. J R Stat Soc B Met 42:143–149

    Google Scholar 

  • Whittle P (1988) Restless bandits: activity allocation in a changing world. In: Gani J (ed) A celebration of applied probability. J appl probab, vol 25A. Applied Probability Trust, Sheffield, pp 287–298

    Google Scholar 

  • Whittle P (1996) Optimal control: basics and beyond. Wiley, Chichester

    Google Scholar 

  • Winston W (1977) Optimality of the shortest line discipline. J Appl Probab 14:181–189

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Niño-Mora.

Additional information

This invited paper is discussed in the comments available at: http://dx.doi.org/10.1007/s11750-007-0026-z, http://dx.doi.org/10.1007/s11750-007-0027-y, http://dx.doi.org/10.1007/s11750-007-0028-x, http://dx.doi.org/10.1007/s11750-007-0029-9, http://dx.doi.org/10.1007/s11750-007-0030-3, http://dx.doi.org/10.1007/s11750-007-0031-2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niño-Mora, J. Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198 (2007). https://doi.org/10.1007/s11750-007-0025-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11750-007-0025-0

Keywords

Mathematics Subject Classification (2000)

Navigation