Abstract
This paper surveys recent work by the author on the theoretical and algorithmic aspects of restless bandit indexation as well as on its application to a variety of problems involving the dynamic allocation of priority to multiple stochastic projects. The main aim is to present ideas and methods in an accessible form that can be of use to researchers addressing problems of such a kind. Besides building on the rich literature on bandit problems, our approach draws on ideas from linear programming, economics, and multi-objective optimization. In particular, it was motivated to address issues raised in the seminal work of Whittle (Restless bandits: activity allocation in a changing world. In: Gani J. (ed.) A Celebration of Applied Probability, J. Appl. Probab., vol. 25A, Applied Probability Trust, Sheffield, pp. 287–298, 1988) where he introduced the index for restless bandits that is the starting point of this work. Such an index, along with previously proposed indices and more recent extensions, is shown to be unified through the intuitive concept of “marginal productivity index” (MPI), which measures the marginal productivity of work on a project at each of its states. In a multi-project setting, MPI policies are economically sound, as they dynamically allocate higher priority to those projects where work appears to be currently more productive. Besides being tractable and widely applicable, a growing body of computational evidence indicates that such index policies typically achieve a near-optimal performance and substantially outperform benchmark policies derived from conventional approaches.
Similar content being viewed by others
References
Adan IJBF, van der Wal J (1998) Combining make to order and make to stock. OR Spektrum 20:73–81
Altman E, Shwartz A (1989) Optimal priority assignment: a time sharing approach. IEEE T Autom Contr 34:1098–1102
Altman E, Stidham S Jr (1995) Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Syst 21:267–291
Asawa M, Teneketzis D (1996) Multi-armed bandits with switching penalties. IEEE T Autom Contr 41:328–348
Bambos N, Michailidis G (2002) On parallel queuing with random server connectivity and routing constraints. Probab Eng Inform Sc 16:185–203
Banks JS, Sundaram RK (1994) Switching costs and the Gittins index. Econometrica 62:687–694
Bellman R (1956) A problem in the sequential design of experiments. Sankhyā 16:221–229
Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math Oper Res 21:257–306
Blackwell D (1962) Discrete dynamic programming. Ann Math Stat 33:719–726
Boxma OJ (1995) Static optimization of queueing systems. In: Agarwal RP (ed) Recent trends in optimization theory and applications. World sci ser appl anal, vol 5. World Sci Publ, River Edge, pp 1–16
Bradt RN, Johnson SM, Karlin S (1956) On sequential designs for maximizing the sum of n observations. Ann Math Stat 27:1060–1074
Chen H, Yao DD (1990) Optimal intensity control of a queueing system with state-dependent capacity limit. IEEE T Autom Contr 35:459–464
Clark JB (2005) The distribution of wealth: a theory of wages, interest and profits. Adamant media corporation (reissued from the edition published in 1908 by the Macmillan Company, New York, NY; first published: 1899)
Coffman EG Jr, Mitrani I (1980) A characterization of waiting time performance realizable by single-server queues. Oper Res 28:810–821
Combé MB, Boxma OJ (1994) Optimization of static traffic allocation policies. Theor Comput Sci 125:17–43
Cox DR, Smith WL (1961) Queues. Methuen, London
De Véricourt F, Karaesmen F, Dallery Y (2000) Dynamic scheduling in a make-to-stock system: a partial characterization of optimal policies. Oper Res 48:811–819
Derman C, Lieberman GJ, Ross SM (1980) On the optimal assignment of servers and a repairman. J Appl Probab 17:577–581
Dongarra JJ, Eijkhout V (2000) Numerical linear algebra algorithms and software. J Comput Appl Math 123:489–514
Dusonchet F, Hongler MO (2003a) Continuous-time restless bandit and dynamic scheduling for make-to-stock production. IEEE T Robotic Autom 19:977–990
Dusonchet F, Hongler MO (2003b) Optimal hysteresis for a class of deterministic deteriorating two-armed bandit problem with switching costs. Automatica 39:1947–1955
Ehsan N, Liu MY (2006) Optimal bandwidth allocation in a delay channel. IEEE J Sel Area Commun 24:1614–1626
Feinberg EA, Shwartz A (2002) Mixed criteria. In: Feinberg EA, Shwartz A (eds) Handbook of Markov decision processes: methods and applications. Kluwer, Boston, pp 209–230
Federgruen A, Groenevelt H (1988) Characterization and optimization of achievable performance in general queueing systems. Oper Res 36:733–741
Gass S, Saaty T (1955) The computational algorithm for the parametric objective function. Naval Res Log 2:39–46
Gittins JC (1979) Bandit processes and dynamic allocation indices (with discussion). J R Stat Soc B Met 41:148–177
Gittins JC (1989) Multi-armed bandit allocation indices. Wiley, Chichester
Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. In: Gani J, Sarkadi K, Vincze I (eds) Progress in statistics. European meeting of statisticians, Budapest, 1972. North-Holland, Amsterdam, pp 241–266
Haji R, Newell GF (1971) A relation between stationary queue and waiting-time distributions. J Appl Probab 8:617–620
Hernández-Lerma O, Hoyos-Reyes LF (2001) A multiobjective control approach to priority queues. Math Method Oper Res 53:265–277
Hordijk A, Koole G (1990) On the optimality of the generalized shortest queue policy. Probab Eng Inform Sci 4:477–487
Johri PK (1989) Optimality of the shortest line discipline with state-dependent service rates. Eur J Oper Res 41:157–161
Jun T (2004) A survey on the bandit problem with switching costs. De Economist 152:513–541
Kallmes MH, Cassandras CG (1995) Two approaches to optimal routing and admission control in systems with real-time traffic. J Optim Theory Appl 84:311–338
Kantorovich LV (1959) Ekonomicheskii raschet nailuchshego ispolzovania resursov. Academy of Sciences, USSR (Engl trans: The best use of economic resources, Harvard University Press, Cambridge, 1965)
Keilson J, Servi LD (1988) A distributional form of Little’s law. Oper Res Lett 7:223–227
Kelly FP (1981) Multi-armed bandits with discount factor near one: the Bernoulli case. Ann Stat 9:987–1001
Kim E, Van Oyen MP (1998) Beyond the c μ rule: dynamic scheduling of a two-class loss queue. Math Method Oper Res 48:17–36
Kleinrock L (1965) A conservation law for a wide class of queueing disciplines. Naval Res Log 12:181–192
Kleinrock L (1975) Queueing systems, vol I: Theory. Wiley, New York
Klimov GP (1974) Time-sharing service systems. I. Theory Probab Appl 19:532–551
Koopmans TC (1957) Three essays on the state of economic science. McGraw–Hill, New York, chap Allocation of resources and the price system
Krishnan KR (1988) State-dependent allocation of job stream to parallel workstations. In: Proceedings of the 27th conference on decision and control. IEEE, Piscataway, pp 2332–2333
Krishnan KR (1990) Joining the right queue: a state-dependent decision rule. IEEE T Autom Contr 35:104–108
Lewis ME, Puterman ML (2002) Bias optimality. In: Feinberg EA, Schwartz A (ed) Handbook of Markov decision processes. Kluwer Academic, Boston, pp 89–111
Lott C, Teneketzis D (2000) On the optimality of an index rule in multichannel allocation for single-hop mobile networks with multiple service classes. Probab Eng Inform Sc 14:259–297
Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv Appl Probab 33:76–98
Niño-Mora J (2002) Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach. Math Program 93:361–413
Niño-Mora J (2003) Restless bandit marginal productivity indices, diminishing returns, and scheduling a multiclass make-to-order/-stock queue. In: Proceedings of the 41st annual Allerton conference on communication, control and computing. Univ of Illinois at Urbana-Champaign, Monticello, pp 100–109
Niño-Mora J (2005) A marginal productivity index policy for the finite-horizon multiarmed bandit problem. In: Proceedings of the 44th IEEE conference on decision and control and European control conference ECC 2005. IEEE, Piscataway, pp 1718–1722
Niño-Mora J (2006a) Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math Oper Res 31:50–84
Niño-Mora J (2006b) Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue. Queueing Syst 54:281–312
Niño-Mora J (2006c) Marginal productivity index policies for scheduling multiclass wireless transmissions. In: NGI 2006, proceedings of the 2nd EuroNGI conference on next generation Internet networks—design and engineering. IEEE, Piscataway, pp 342–349
Niño-Mora J (2007a) A (2/3)n 3 fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain. INFORMS J Comput, vol 19, DOI: 10.1287/ijoc.1060.0206
Niño-Mora J (2007b) Marginal productivity index policies for admission control and routing to parallel multi-server loss queues with reneging. In: Network control and optimization, proceedings of the first EuroFGI international conference NET-COOP 2007. Lect Notes Comput Sc, vol 4465. Springer, Berlin, pp 138–149
Niño-Mora J (2007c) Marginal productivity index policies for scheduling multiclass delay-/loss-sensitive traffic with delayed state observation. In: NGI 2007, proceedings of the 3rd EuroNGI conference on next generation Internet networks—design and engineering for heterogeneity. IEEE, Piscataway, pp 209–217
Niño-Mora J (2007d) A faster index algorithm and a computational study for bandits with switching costs. INFORMS J Comput (in press)
Niño-Mora J (2007e) Characterization and computation of restless bandit marginal productivity indices. In: SMCtools’07, proceedings from the 2007 workshop on tools for solving structured Markov chains. ACM, New York
Niño-Mora J (2007f) Computing an index policy for bandits with switching penalties. In: SMCtools’07, proceedings from the 2007 workshop on tools for solving structured Markov chains. ACM, New York
Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math Oper Res 24:293–305
Peña-Pérez A, Zipkin P (1997) Dynamic scheduling rules for a multiproduct make-to-stock queue. Oper Res 45:919–930
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Rothkopf MH (1966) Scheduling with random service times. Manag Sci 12:707–713
Sevcik KC (1974) Scheduling for minimum total loss using service time distributions. J ACM 21:66–75
Shanthikumar JG, Yao DD (1992) Multiclass queueing systems: polymatroidal structure and optimal scheduling control. Oper Res 40:S293–S299
Smith WE (1956) Various optimizers for single-stage production. Naval Res Log 3:59–66
Soman CA, Van Donk DP, Gaalman G (2004) Combined make-to-order and make-to-stock in a food production system. Int J Prod Econ 90:223–235
Sparaggis PD, Cassandras CG, Towsley D (1993) On the duality between routing and scheduling systems with finite buffer space. IEEE T Autom Contr 38:1440–1446
Stidham S Jr (1985) Optimal control of admission to a queueing system. IEEE T Autom Contr 30:705–713
Tassiulas L, Ephremides A (1993) Dynamic server allocation to parallel queues with randomly varying connectivity. IEEE T Inform Theory 39:466–478
Tsoucas P (1991) The region of achievable performance in a model of Klimov. Tech Rep RC16543, IBM TJ Watson Research Center, Yorktown Heights, New York, NY
Varaiya PP, Walrand JC, Buyukkoc C (1985) Extensions of the multiarmed bandit problem: the discounted case. IEEE T Autom Contr 30:426–439
Veatch MH, Wein LM (1996) Scheduling a multiclass make-to-stock queue: index policies and hedging points. Oper Res 44:634–647
Wang YG (1997) Error bounds for calculation of the Gittins indices. Aust J Stat 39:225–233
Wasserman KM, Bambos N (1996) Optimal server allocation to parallel queues with finite-capacity buffers. Probab Eng Inform Sc 10:279–285
Weber RR (1992) On the Gittins index for multiarmed bandits. Ann Appl Probab 2:1024–1033
Weber RR, Weiss G (1990) On an index policy for restless bandits. J Appl Probab 27:637–648
Weber RR, Weiss G (1991) Addendum to “On an index policy for restless bandits”. Adv Appl Probab 23:429–430
Wein LM (1992) Dynamic scheduling of a multiclass make-to-stock queue. Oper Res 40:724–735
Whittle P (1980) Multi-armed bandits and the Gittins index. J R Stat Soc B Met 42:143–149
Whittle P (1988) Restless bandits: activity allocation in a changing world. In: Gani J (ed) A celebration of applied probability. J appl probab, vol 25A. Applied Probability Trust, Sheffield, pp 287–298
Whittle P (1996) Optimal control: basics and beyond. Wiley, Chichester
Winston W (1977) Optimality of the shortest line discipline. J Appl Probab 14:181–189
Author information
Authors and Affiliations
Corresponding author
Additional information
This invited paper is discussed in the comments available at: http://dx.doi.org/10.1007/s11750-007-0026-z, http://dx.doi.org/10.1007/s11750-007-0027-y, http://dx.doi.org/10.1007/s11750-007-0028-x, http://dx.doi.org/10.1007/s11750-007-0029-9, http://dx.doi.org/10.1007/s11750-007-0030-3, http://dx.doi.org/10.1007/s11750-007-0031-2.
Rights and permissions
About this article
Cite this article
Niño-Mora, J. Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198 (2007). https://doi.org/10.1007/s11750-007-0025-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11750-007-0025-0
Keywords
- Priority allocation
- Stochastic scheduling
- Index policies
- Restless bandits
- Marginal productivity index
- Indexability
- Dynamic control of queues
- Control by price