Multi-label Text Classification for Public Procurement in Spanish

María Navas Loro; Daniel Garijo Verdejo; Oscar Corcho García

Ayuda

Multi-label Text Classification for Public Procurement in Spanish

Autores: María Navas Loro, Daniel Garijo Verdejo, Oscar Corcho García
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 69, 2022, págs. 73-82
Idioma: inglés
Títulos paralelos:
- Clasificación multi-etiqueta de textos de licitaciones públicas en español
Enlaces
- Texto completo

Dialnet Métricas: 1 Cita

Resumen
- español
  Las licitaciones públicas suponen el 14% del presupuesto anual de la Unión Europea. En Europa, los procesos de contratación se clasifican usando la taxonomía Common Procurement Vocabulary (CPVs), diseñada para facilitar la generación de estadísticas, las búsquedas y la creación de alertas que puedan utilizar los posibles licitadores. Los códigos CPV suelen ser asignados manualmente por los empleados públicos encargados del proceso de contratación. Sin embargo, la clasificación de textos de acuerdo con estos códigos no es trivial, pues existen más de 9000 CPVs y no siempre se siguen los mismos criterios para su asignación. En este artículo se propone un clasificador que utiliza como entrada la descripción textual del proceso de contratación, y produce códigos de entre las 45 categorías de CPV más generales de la jerarquía. Trabajamos solo con textos en español, aunque nuestro enfoque puede extenderse fácilmente a otros idiomas. Los resultados obtenidos superan el estado del arte (10% de mejora en F1), y se encuentran disponibles online.
- English
  Public procurement accounts for a 14% of the annual budget of the different governments of the European Union. In Europe, contracting processes are classified using Common Procurement Vocabulary codes (CPVs), a taxonomy designed to facilitate statistical reporting, search and the creation of alerts that can be used by potential bidders. CPVs are commonly assigned manually by public employees in charge of contracting processes. However, CPV classification is not a trivial task, as there are more than 9,000 different CPV categories, which are often assigned following heterogeneous criteria. In this paper we have created a CPV classifier that uses as an input the textual description of the contracting process, and assigns CPVs from the 45 top-level CPV categories. We work only with texts in Spanish, although our approach may be easily extended to other languages. Our results improve the state of the art (10% F1-score improvement) and are available online.
Referencias bibliográficas
- Aggarwal, C. C. and C. Zhai. 2012. A survey of text classification algorithms. In Mining text data. Springer, pages 163–222.
- Ahmia, O. 2020. Assisted strategic monitoring on call for tender databases using natural language processing, text mining and deep learning....
- Bhatia, K., H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse local embeddings for extreme multi-label classification. In C. Cortes, N....
- Boser, B. E., I. M. Guyon, and V. N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual...
- Chang, W.-C., H.-F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon. 2020. Taming pretrained transformers for extreme multilabel text classification....
- Deloitte. 2020. Study on up-take of emerging technologies in public procurement. Technical report, Deloitte. European Commission. 2020. eForms...
- Freund, Y. and R. E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer...
- Gargiulo, F., S. Silvestri, M. Ciampi, and G. De Pietro. 2019. Deep neural network for hierarchical extreme multi-label text classification....
- Gutiérrez Fandiño, A., J. Armengol-Estape, M. Pamies, J. Llop-Palao, J. Silveira- Ocampo, C. P. Carrino, A. González- Agirre, C. Armentano-Oller,...
- Hand, D. J. 2007. Principles of data mining. Drug safety, 30(7):621–622.
- Hastie, T., S. Rosset, J. Zhu, and H. Zou. 2009. Multi-class adaboost. Statistics and its Interface, 2(3):349–360.
- Isguder-Sahin, G. G., H. R. Zafer, and E. Adah. 2014. Polarity detection of turkish comments on technology companies. In 2014 International...
- Kayte, S. and P. Schneider-Kamp. 2019. A mixed neural network and support vector machine model for tender creation in the european union ted...
- Liu, J., W.-C. Chang, Y. Wu, and Y. Yang. 2017. Deep learning for extreme multilabel text classification. In Proceedings of the 40th international...
- Minaee, S., N. Kalchbrenner, E. Cambria, et al. 2021. Deep learning–based text classification: A comprehensive review. ACM Comput. Surv.,...
- Minsky, M. 1961. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8–30.
- Navas-Loro, M., D. Garijo, and O. Corcho. 2022. Code repository for multi-label text classification for public procurement in spanish, May.
- Prabhu, Y. and M. Varma. 2014. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the...
- Quinlan, J. R. 1986. Induction of decision trees. Machine learning, 1(1):81–106.
- Siblini, W., P. Kuntz, and F. Meyer. 2018. CRAFTML, an efficient clustering-based random forest for extreme multi-label learning. In J. Dy...
- Soylu, A., Corcho, B. Elvesater, C. Badenes- Olmedo, F. Yedro-Mart´ınez, et al. 2022. Data quality barriers for transparency in public procurement....
- Suta, A. 2019. Multilabel text classification of public procurements using deep learning intent detection. Master’s thesis, KTH, Mathematical...
- Zhang, M.-L. and Z.-H. Zhou. 2007. Ml-knn: A lazy learning approach to multi-label learning. Pattern recognition, 40(7):2038– 2048.