Markov control models with unknown random state–action-dependent discount factors

J. Adolfo Minjárez-Sosa ^[1]
1. [1] Universidad de Sonora
  
  Universidad de Sonora
  
  México
Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 23, Nº. 3, 2015, págs. 743-772
Idioma: inglés
DOI: 10.1007/s11750-015-0360-5
Enlaces
- Texto completo (pdf)
Resumen
- The paper deals with a class of discounted discrete-time Markov control models with non-constant discount factors of the form α~(xn,an,ξn+1) , where xn,an, and ξn+1 are the state, the action, and a random disturbance at time n, respectively, taking values in Borel spaces. Assuming that the one-stage cost is possibly unbounded and that the distributions of ξn are unknown, we study the corresponding optimal control problem under two settings. Firstly we assume that the random disturbance process {ξn} is formed by observable independent and identically distributed random variables, and then we introduce an estimation and control procedure to construct strategies. Instead, in the second one, {ξn} is assumed to be non-observable whose distributions may change from stage to stage, and in this case the problem is studied as a minimax control problem in which the controller has an opponent selecting the distribution of the corresponding random disturbance at each stage.