Preserving Grammatical Gender when Debiasing Word Embeddings in Spanish

Aitana Morote Martínez; Juan Pablo Consuegra Ayala; Elena Lloret Pastor

Ayuda

Preserving Grammatical Gender when Debiasing Word Embeddings in Spanish

Autores: Aitana Morote Martínez, Juan Pablo Consuegra Ayala, Elena Lloret Pastor
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 75, 2025 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025), págs. 211-222
Idioma: inglés
Títulos paralelos:
- Propuesta de método de conservación del género gramatical en embeddings de palabras para la mitigación de sesgos en español
Enlaces
- Texto completo
Resumen
- español
  Los word embeddings son ampliamente utilizados en el Procesamiento del Lenguaje Natural, pero a menudo codifican sesgos de género, lo que puede dar lugar a resultados discriminatorios. Existen varias técnicas de mitigación de sesgos (debiasing), centradas en el inglés, que no tienen en cuenta las complejidades de las lenguas con género gramatical como el español. Este artículo presenta INLP-Gram, un algoritmo dise˜nado para mitigar el sesgo de género en embeddings en español que es capaz de conservar la información de género gramatical. Nuestro algoritmo es una adaptación del algoritmo INLP (Iterative Nullspace Projection), pero teniendo en cuenta las variaciones morfológicas de idiomas con género gramatical. Evaluamos INLP-Gram mediante el Word Embedding Association Test (WEAT) y una prueba de clasificación del género gramatical. Nuestros resultados demuestran que INLPGram reduce efectivamente el sesgo de género a la vez que mantiene las distinciones gramaticales de género. Este trabajo supone un avance en las técnica de mitigación de sesgos para word embeddings en lenguas con riqueza morfológica.
- English
  Word embeddings are widely used in Natural Language Processing but often encode gender biases, which can lead to discriminatory outcomes. Various debiasing techniques exist, especially focusing on English, thus failing to account for the complexities of languages with grammatical gender, such as Spanish. In this paper, we propose INLP-Gram, an algorithm designed to mitigate gender bias in Spanish word embeddings while preserving grammatical gender information. It is an adaptation of the Iterative Nullspace Projection (INLP). We evaluate INLP-Gram using the Word Embedding Association Test (WEAT) and a grammatical gender classification test. Our results demonstrate that INLP-Gram effectively reduces gender bias while maintaining grammatical gender distinctions. This work advances bias mitigation techniques for word embeddings in morphologically-rich languages.
Referencias bibliográficas
- Angwin, J., J. Larson, S. Mattu, and L. Kirchner. 2016. Machine bias: There’s software used across the country to predict future criminals....
- Bojanowski, P., E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching word vectors with subword information. Transactions of the association...
- Bolukbasi, T., K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing...
- Brunet, M.-E., C. Alkalay-Houlihan, A. Anderson, and R. Zemel. 2019. Understanding the origins of bias in word embeddings. In International...
- Caliskan, A., P. P. Ajay, T. Charlesworth, R. Wolfe, and M. R. Banaji. 2022. Gender bias in word embeddings: A comprehensive analysis of frequency,...
- Caliskan, A., J. J. Bryson, and A. Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science,...
- Cardellino, C. 2019. Spanish Billion Words Corpus and Embeddings, August. Available at https://crscardellino.github.io/SBWCE/.
- Casheekar, A., A. Lahiri, K. Rath, K. S. Prabhakar, and K. Srinivasan. 2024. A contemporary review on chatbots, AI-powered virtual conversational...
- Consuegra-Ayala, J. P., Y. Gutiérrez, Y. Almeida-Cruz, and M. Palomar. 2025. Bias mitigation for fair automation of classification tasks....
- Costa Jussa, M., P. Andrews, E. Smith, P. Hansanti, C. Ropers, E. Kalbassi, C. Gao, D. Licht, and C.Wood. 2023. Multilingual holistic bias:...
- Dev, S., T. Li, J. M. Phillips, and V. Srikumar. 2021. OSCaR: Orthogonal subspace correction and rectification of biases in word embeddings....
- Escudé Font, J. and M. R. Costa-Jussà. 2019. Equalizing gender bias in neural machine translation with word embeddings techniques. In M. R....
- Gambella, C., B. Ghaddar, and J. Naoum-Sawaya. 2021. Optimization problems for machine learning: A survey. European Journal of Operational...
- Gorban, A., B. Kégl, D. Wunsch, and A. Zinovyev. 2008. Principal Manifolds for Data Visualisation and Dimension Reduction, LNCSE (Lecture...
- Hadi, M. U., Q. A. Tashi, R. Qureshi, A. Shah, A. Muneer, M. Irfan, A. Zafar, M. B. Shaikh, N. Akhtar, S. Z. Hassan, M. Shoman, J. Wu, S....
- Haghighatkhah, P., A. Fokkens, P. Sommerauer, B. Speckmann, and K. Verbeek. 2022. Better hit the nail on the head than beat around the bush:...
- Johnson, S. J., M. R. Murty, and I. Navakanth. 2024. A detailed review on word embedding techniques with emphasis on word2vec. Multimedia...
- Lauscher, A., G. Glavas, S. P. Ponzetto, and I. Vuli´c. 2020. A general framework for implicit and explicit debiasing of distributional word...
- Li, C. and B. Wang. 2014. Fisher linear discriminant analysis. Lecture notes, College of Computer and Information Science, Northeastern University.
- Meng, X., X. Yan, K. Zhang, D. Liu, X. Cui, Y. Yang, M. Zhang, C. Cao, J. Wang, X. Wang, et al. 2024. The application of large language models...
- Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013. Efficient estimation of word representations in vector space. In 1st International Conference...
- Omrani Sabbaghi, S. and A. Caliskan. 2022. Measuring gender bias in word embeddings of gendered languages requires disentangling grammatical...
- Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. 2011....
- Pennington, J., R. Socher, and C. Manning. 2014. GloVe: Global vectors for word representation. In A. Moschitti, B. Pang, and W. Daelemans,...
- Prewitt-Freilino, J., T. A. Caswell, and E. Laakso. 2011. The gendering of language: A comparison of gender equality in countries with gendered,...
- Ravfogel, S., Y. Elazar, H. Gonen, M. Twiton, and Y. Goldberg. 2020. Null it out: Guarding protected attributes by iterative nullspace projection....
- Real Academia Española. n.d. Banco de datos (CORPES XXI) [en línea]. http://www.rae.es.
- Rehurek, R. and P. Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New...
- Snedecor, G. W. and W. G. Cochran. 1989. Statistical Methods. Iowa State University Press, Ames, Iowa, 8th edition.
- Sullivan, G. M. and R. Feinn. 2012. Using effect size—or why the p value is not enough. Journal of Graduate Medical Education, 4(3):279–282,...
- Sun, G., Y. Cheng, Z. Zhang, X. Tong, and T. Chai. 2024. Text classification with improved word embedding and adaptive segmentation. Expert...
- Takeshita, M., Y. Katsumata, R. Rzepka, and K. Araki. 2020. Can existing methods debias languages other than English? First attempt to analyze...
- Zhao, H., Z. Liu, Z. Wu, Y. Li, T. Yang, P. Shu, S. Xu, H. Dai, L. Zhao, G. Mai, et al. 2024. Revolutionizing finance with LLMs: An overview...
- Zhou, P., W. Shi, J. Zhao, K.-H. Huang, M. Chen, R. Cotterell, and K.-W. Chang. 2019. Examining gender bias in languages with grammatical...
- Zulfiqar, H., Z. Guo, R. M. Ahmad, Z. Ahmed, P. Cai, X. Chen, Y. Zhang, H. Lin, and Z. Shi. 2024. Deep-STP: a deep learning-based approach...