Ir al contenido

Documat


Resumen de Robust methods based on shrinkage

Elisa Cabana Garceran del Vall

  • In this thesis, robust methods based on the notion of shrinkage are proposed for outlier detection and robust regression. A collection of robust Mahalanobis distances is proposed for multivariate outlier detection. The robust intensity and scaling factors, needed to define the shrinkage of the robust estimators used in the distances, are optimally estimated. Some properties are investigated, such as the affine equivariance and the breakdown value. The performance of the proposal is illustrated through the comparison to other robust techniques from the literature, in a simulation study and with a real example of breast cancer data. The robust alternatives are also reviewed, highlighting their advantages and disadvantages. The behavior when the underlying distribution is heavy-tailed or skewed, shows the appropriateness of the proposed method when we deviate from the common assumption of normality. The resulting high true positive rates and low false positive rates in the vast majority of cases, as well as the significantly smaller computational time show the advantages of the proposal.

    On the other hand, a robust estimator is proposed for the parameters that characterize the linear regression problem. It is also based on the notion of shrinkages. A thorough simulation study is conducted to investigate the efficiency with Normal and heavy-tailed errors, the robustness under contamination, the computational times, the affine equivariance and breakdown value of the regression estimator. It is compared to the classical Ordinary Least Squares (OLS) approach and the robust alternatives from the literature, which are also briefly reviewed in the thesis. Two classical data-sets often used in the literature and a real socio-economic data-set about the Living Environment Deprivation (LED) of areas in Liverpool (UK), are studied. The results from the simulations and the real data examples show the advantages of the proposed robust estimator in regression. Also, with the LED data-set it is also shown that the proposed robust regression method has improved performance than machine learning techniques previously used for this data, with the advantage of interpretability.

    Furthermore, an adaptive threshold, that depends on the sample size and the dimension of the data, is introduced for the proposed robust Mahalanobis distance based on shrinkage estimators. The cut-off is different than the classical choice of the 0.975 chi-square quantile providing a more accurate method to detect multivariate outliers. A simulation study is done to check the performance improvement of the new cut-off against the classical. The adjusted quantile shows improved performance, even when the underlying distribution is heavy-tailed or skewed. The method is illustrated using the LED data-set, and the results demonstrate the additional advantages of the adaptive threshold for the regression problem.


Fundación Dialnet

Mi Documat