Efficiency analysis trees

Miriam Esteve Campello

Ayuda

Efficiency analysis trees

Autores: Miriam Esteve Campello
Directores de la Tesis: Juan Aparicio Baeza (dir. tes.) , Alejandro Rabasa Dolado (codir. tes.)
Lectura: En la Universidad Miguel Hernández de Elche ( España ) en 2022
Idioma: español
Tribunal Calificador de la Tesis: Ernestina Menasalvas (presid.) , Antonio Peñalver Benavent (secret.) , Jose H. Dulá (voc.)
Enlaces
- Tesis en acceso abierto en: RediUMH
Resumen
- The definition of technical efficiency through the prior estimation of a production frontier has been a relevant topic in the literature related to production theory and engineering. In the last forty years, many parametric and non-parametric approaches have been introduced to estimate production frontiers for a given set of data. However, few of these methodologies are based on machine learning techniques, despite being a growing field of research. In this thesis, a new methodology based on regression trees is introduced to estimate the production frontiers satisfying the fundamental postulates of microeconomics, such as the property of free disposal. This new approach, known as Efficiency Analysis Trees (EAT), shares some similarities with the Free Disposal Hull (FDH) technique. However, unlike FDH, EAT overcomes the overfitting problem by using cross-validation to prune the deep tree obtained in a first stage. Through Monte Carlo simulations, the performance of EAT is measured, showing that the new approach reduces the mean square error associated with the estimation of the real frontier between 13% and 70% compared to standard FDH.
  
  However, these individual decision trees have some drawbacks: (1) Individual trees do not usually have a high level of prediction accuracy, and (2) trees can be very poorly robust, that is, a small change in the data can cause a big change in the final structure of the fitted tree. That is why an aggregation learning method that works by building a multitude of decision trees at the time of training and aggregating the information from the individual trees into a final prediction value, a technique known as Random Forest, shows that it is capable of overcoming these limitations (James et al., 2013). In this sense, in this thesis, the Random Forest technique is adapted (Breiman, 2001) (RF+EAT) to estimate production frontiers and technical efficiency. To do this, decision tree models are applied to estimate non-overfitted production possibility sets that satisfy the property of free disposability in the context of FDH. There are three main implications of the development of the new approach in this thesis. First, the estimates derived from technical efficiency are robust to resampling of the data and input variables. Secondly, a method is suggested to determine the importance of the input variables in the model, which allows a classification of the inputs to be established. Third, if the relationship between the sample size and the number of variables (inputs and outputs) is low or moderately low, the standard efficiency models in the literature may result in a considerable number of units being evaluated as technically efficient; especially in the case of FDH. This lack of discrimination is often referred to in the literature as the "curse of dimensionality." In this thesis, it is shown that the Random Forest technique can also be considered a remedy for this type of problem.
  
  In another sense, from the computational point of view, the algorithm used by EAT is based on a heuristic technique to select the next node to be divided during the growth process of the corresponding decision tree. However, as shown in this thesis, this heuristic does not always produce the minimum mean square error among all the possible trees that could be developed. Therefore, one of the main objectives is to improve the accuracy of the production function estimator generated from EAT by resorting to backtracking techniques (Baase, 2009 and Horowitz and Sahni, 1978). In particular, we combine the idea behind the heuristic approach with the potentiality of backtracking ((Pearl, 1984 and Tarjan, 1972) to improve the quality of the EAT-based production function estimator. In addition, through this new approach, it is possible to reduce the computational load of the standard backtracking techniques applied to the EAT methodology, as shown in the simulated experiences carried out.
  
  On the other hand, also from a computational approach, this thesis develops a new package in R, named eat, which includes the functions to estimate the production frontiers and the technical efficiency measures of EAT and RF+EAT. The package includes the functions to estimate the input and output oriented radial measures, the input and output oriented Russell measures, the directional distance function and the weighted additive model. Furthermore, from the perspective of visualizing the models, the package includes graphical representations of the production frontier through tree structures and obtaining rankings of input variable importance in the analysis. In this thesis, the operation of the package is described through the use of a real database.