Multi-Armed Bandit Processes with Optimal Selection of the Operating Times

Autores: Ricardo Vélez Ibarrola , Pilar Ibarrola Muñoz
Localización: Test: An Official Journal of the Spanish Society of Statistics and Operations Research, ISSN-e 1863-8260, ISSN 1133-0686, Vol. 14, Nº. 1, 2005, págs. 239-255
Idioma: inglés
DOI: 10.1007/bf02595405
Enlaces
- Texto completo (pdf)
Resumen
- A multi-armed Bandit Problem is considered such that at each decision epoch it is to be decided the next project to be undertaken and the span of time to be spent in this project, instead of reconsidering the new project at each stage. This extended model, inspired in sequentially planned decision procedures (Schmitz, 1993), is formulated in Section 1 and tries to exploit the reduction of costs produced by longer periods dedicated to the same activity. Following the method by Whittle (1980), Section 2 introduces a retirement option with a variable reward M, and Section 3 extends Gittins indexes to this case. Another relevant conclusion is that the optimal period of activity for each project does not depend on the retirement reward M. Finally, we show that the optimal strategy is to choose the project with the highest Gittins index.