Resumen de Multi-Armed Bandit Processes with Optimal Selection of the Operating Times

A multi-armed Bandit Problem is considered such that at each decision epoch it is to be decided the next project to be undertaken and the span of time to be spent in this project, instead of reconsidering the new project at each stage. This extended model, inspired in sequentially planned decision procedures (Schmitz, 1993), is formulated in Section 1 and tries to exploit the reduction of costs produced by longer periods dedicated to the same activity. Following the method by Whittle (1980), Section 2 introduces a retirement option with a variable reward M, and Section 3 extends Gittins indexes to this case. Another relevant conclusion is that the optimal period of activity for each project does not depend on the retirement reward M. Finally, we show that the optimal strategy is to choose the project with the highest Gittins index.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: