Ir al contenido

Documat


Joint outlier detection and variable selection using discrete optimization

  • Autores: Mahdi Jammal, Stephane Canu, Maher Abdallah
  • Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 45, Nº. 1, 2021, págs. 47-66
  • Idioma: inglés
  • Enlaces
  • Resumen
    • In regression, the quality of estimators is known to be very sensitive to the presence of spurious variables and outliers. Unfortunately, this is a frequent situation when dealing with real data. To handle outlier proneness and achieve variable selection, we propose a robust method performing the outright rejection of discordant observations together with the selection of relevant variables. A natural way to define the corresponding optimization problem is to use the ℓ0 norm and recast it as a mixed integer optimization problem. To retrieve this global solution more efficiently, we suggest the use of additional constraints as well as a clever initialization. To this end, an efficient and scalable non-convex proximal alternate algorithm is introduced. An empirical comparison between the ℓ0 norm approach and its ℓ1 relaxation is presented as well. Results on both synthetic and real data sets provided that the mixed integer programming approach and its discrete first order warm start provide high quality solutions.

  • Referencias bibliográficas
    • Alfons, A., Croux, C. and Gelper, S. et al. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets....
    • Bertsimas, D., King, A. and Mazumder, R. (2015). Best subset selection via a modern optimization lens. Annals of Statistics, 47, 2324–2354.
    • Bolte, J., Sabach, S. and Teboulle, M. (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical...
    • Campos, G. O., Zimek, A., Sander, J., Campello, R. J., Micenková, B., Schubert, E., Assent, I. and Houle, M. E. (2016). On the evaluation...
    • Chen, Y., Caramanis, C. and Mannor, S. (2013). Robust sparse regression under adversarial corruption. In International Conference on Machine...
    • Dalalyan, A. S. and Thompson, P. (2019). Outlier-robust estimation of a sparse linear model using ℓ1- penalized huber’s m-estimator. arXiv...
    • Giloni, A. and Padberg, M. (2002). Least trimmed squares regression, least median squares regression, and mathematical programming. Mathematical...
    • Hastie, T., Tibshirani, R. and Tibshirani, R. J. (2017). Extended comparisons of best subset selection, forward stepwise selection, and the...
    • Hodge, V. and Austin, J. (2004). A survey of outlier detection methodologies. Artificial intelligence review, 22, 85–126.
    • Miller, A. (2002). Subset selection in regression. CRC Press.
    • Miyashiro, R. and Takano, Y. (2015). Subset selection by mallows: A mixed integer programming approach. Expert Systems with Applications,...
    • Nguyen, N. H. and Tran, T. D. (2013). Robust lasso with missing and grossly corrupted observations. IEEE transactions on information theory,...
    • Öllerer, V., Alfons, A. and Croux, C. (2016). The shooting s-estimator for robust regression. Computational Statistics, 31, 829–844.
    • Parikh, N. and Boyd, S. P. (2014). Proximal algorithms. Foundations and Trends in optimization, 1, 127– 239.
    • Rousseeuw, P. J. and Hubert, M. (2018). Anomaly detection by robust statistics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge...
    • Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection, Volume 589. John Wiley & Sons.
    • She, Y. and Owen, A. B. (2011). Outlier detection using nonconvex penalized regression. Journal of the American Statistical Association, 106,...
    • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological),...
    • Tibshirani, R., Wainwright, M. and Hastie, T. (2015). Statistical Learning with Sparsity: the Lasso and Generalizations. Chapman and Hall/CRC.
    • Wang, H., Li, G. and Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business...
    • Yang, M., Xu, L., White, M., Schuurmans, D. and Yu, Y.-l. (2010). Relaxed clipping: A global training method for robust regression and classification....

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno