On the Impact of Syntactic Infusion for Gender Categorization Across Contextual Dimensions

Inés Veiga Menéndez; Alberto Muñoz-Ortiz; David Vilares Calvo

Ayuda

On the Impact of Syntactic Infusion for Gender Categorization Across Contextual Dimensions

Autores: Inés Veiga Menéndez, Alberto Muñoz-Ortiz, David Vilares Calvo
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 74, 2025, págs. 159-178
Idioma: inglés
Títulos paralelos:
- Sobre el impacto de la integración sintáctica en la categorización de género a través de dimensiones contextuales
Enlaces
- Texto completo
Resumen
- español
  Este artículo investiga cómo incorporar información sintáctica puede mejorar la clasificación de textos en múltiples dimensiones de género, definidas por nuestra propia identidad (categoría as), la persona a la que nos dirigimos (categoría to) o el individuo del que se habla (categoría about). En concreto, exploramos el uso de gramáticas de dependencias para integrar representaciones sintácticas explícitas, complementando las representaciones de modelos de lenguaje enmascarados preentrenados (MLMs). Nuestro objetivo es determinar si las gramáticas de dependencias aportan algo más allá de la comprensión sintáctica implícita ya capturada por los MLMs. Para ello, primero establecemos un modelo base usando un MLM estándar. A continuación, proponemos una arquitectura neuronal que integra en este modelo estructuras basadas en dependencias de forma explícita, permitiendo comparar del rendimiento y las variaciones. Finalmente, evaluamos los resultados y analizamos las dinámicas las dinámicas de entrenamiento de las dos variantes propuestas para ofrecer información adicional sobre su comportamiento durante la etapa de ajuste fino. La información sintáctica explícita mejora el rendimiento en configuraciones de tarea única, aunque sus beneficios disminuyen en escenarios multitarea.
- English
  This paper investigates how incorporating syntactic information can enhance the categorization of text into multiple gender dimensions, defined by our own identity (as category), the person we are addressing (to category), or the individual we are discussing (about category). Specifically, we explore the use of dependency grammars to integrate explicit syntactic embeddings while leveraging the strengths of pre-trained masked language models (MLMs). Our goal is to determine if dependency grammars add value beyond the implicit syntactic understanding already captured by MLMs. We begin by establishing a baseline using standard MLMs. Next, we propose a neural architecture that explicitly integrates dependency-based structures into this baseline, enabling a comparative analysis of performance and variations. Finally, in addition to evaluating the results, we analyzed the training dynamics of the two proposed variants to provide additional insights into their behavior during the fine-tuning stage. Explicit syntactic information boosts performance in single-task setups, though its gains fade in multitask scenarios.
Referencias bibliográficas
- Amini, A., T. Liu, and R. Cotterell. 2023. Hexatagging: Projective dependency parsing as tagging. In A. Rogers, J. Boyd-Graber, and N. Okazaki,...
- Asr, F. T., M. Mazraeh, A. Lopes, V. Gautam, J. Gonzales, P. Rao, and M. Taboada. 2021. The gender gap tracker: Using natural language processing...
- Bai, J., Y. Wang, Y. Chen, Y. Yang, J. Bai, J. Yu, and Y. Tong. 2021. Syntax-BERT: Improving pre-trained transformers with syntax trees. In...
- Barnes, J., R. Kurtz, S. Oepen, L. Ovrelid, and E. Velldal. 2021. Structured sentiment analysis as dependency graph parsing. In Proceedings...
- Bartl, M., A. Mandal, S. Leavy, and S. Little. 2024. Gender bias in natural language processing and computer vision: A comparative survey....
- Bartl, M., M. Nissim, and A. Gatt. 2020. Unmasking contextual stereotypes: Measuring and mitigating BERT’s gender bias. In Proceedings of...
- Basta, C., M. R. Costa-jussà, and J. A. R. Fonollosa. 2020. Towards mitigating gender bias in a decoder-based neural machine translation model...
- Bender, E. M., T. Gebru, A. McMillan-Major, and S. Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?....
- Berzak, Y., Y. Huang, A. Barbu, A. Korhonen, and B. Katz. 2016. Anchoring and agreement in syntactic annotations. In Proceedings of the 2016...
- Bolukbasi, T., K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. 2016. Advances in neural information processing systems, 29.
- Bugliarello, E. and N. Okazaki. 2020. Enhancing machine translation with dependency-aware self-attention. In Proceedings of the 58th Annual...
- Chen, H., M. Roth, and A. Falenska. 2024. What can go wrong in authorship profiling: Cross-domain analysis of gender and age prediction. In...
- Cho, I., Y. Jung, and J. Hockenmaier. 2023. SIR-ABSC: Incorporating syntax into RoBERTa-based sentiment analysis models with a special aggregator...
- Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding....
- Dinan, E., A. Fan, L. Wu, J. Weston, D. Kiela, and A. Williams. 2020. Multidimensional gender bias classification. In Proceedings of the 2020...
- Fraser, K. and S. Kiritchenko. 2024. Examining gender and racial bias in large vision–language models using a novel dataset of parallel images....
- Garimella, A., A. Amarnath, K. Kumar, A. P. Yalla, A. N, N. Chhaya, and B. V. Srinivasan. 2021. He is very intelligent, she is very beautiful?...
- Garimella, A., C. Banea, D. Hovy, and R. Mihalcea. 2019. Women’s syntactic resilience and men’s grammatical luck: Gender-bias in part-of-speech...
- Garrido-Muñoz, I., A. Montejo-Ráez, and F. M. Santiago. 2022. Exploring gender bias in spanish deep learning models. In SEPLN (Projects and...
- Ghanbarzadeh, S., Y. Huang, H. Palangi, R. Cruz Moreno, and H. Khanpour. 2023. Gender-tuning: Empowering fine-tuning for debiasing pre-trained...
- Glavas, G. and I. Vulic. 2021. Is supervised syntactic parsing beneficial for language understanding tasks? an empirical investigation. In...
- Gonen, H. and Y. Goldberg. 2019. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove...
- Hall Maudslay, R., H. Gonen, R. Cotterell, and S. Teufel. 2019. It’s all in the name: Mitigating gender bias with name-based counterfactual...
- Han, D., P. Martínez-Gómez, Y. Miyao, K. Sudoh, and M. Nagata. 2013. Using unlabeled dependency parsing for prereordering for Chinese-to-Japanese...
- Harrison, S., E. Gualdoni, and G. Boleda. 2023. Run like a girl! sport-related gender bias in language and vision. In A. Rogers, J. Boyd-Graber,...
- Hewitt, J. and C. D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of...
- Imran, M., O. Kellert, and C. Gómez-Rodríguez. 2024. A syntax-injected approach for faster and more accurate sentiment analysis. arXiv preprint...
- Iwamoto, R., I. Yoshida, H. Kanayama, T. Ohko, and M. Muraoka. 2023. Incorporating syntactic knowledge into pretrained language model using...
- Johansson, R. and P. Nugues. 2008. The effect of syntactic representation on semantic role labeling. In D. Scott and H. Uszkoreit, editors,...
- Kaneko, M., A. Imankulova, D. Bollegala, and N. Okazaki. 2022. Gender bias in masked language models for multiple languages. In M. Carpuat,...
- Kurita, K., N. Vyas, A. Pareek, A. W. Black, and Y. Tsvetkov. 2019. Measuring bias in contextualized word representations. In Proceedings...
- Li, Z., J. Cai, S. He, and H. Zhao. 2018. Seq2seq dependency parsing. In Proceedings of the 27th International Conference on Computational...
- Liu, T., X. Wang, C. Lv, R. Zhen, and G. Fu. 2020. Sentence matching with syntax- and semantics-aware BERT. In Proceedings of the 28th International...
- Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. Roberta: A robustly optimized...
- Marjanovic, S., K. Stanczak, and I. Augenstein. 2022. Quantifying gender biases towards politicians on reddit. PloS one, 17(10):e0274317.
- May, C., A. Wang, S. Bordia, S. R. Bowman, and R. Rudinger. 2019. On measuring social biases in sentence encoders. In Proceedings of the 2019...
- Muñoz-Ortiz, A., M. Strzyz, and D. Vilares. 2021. Not all linearizations are equally data-hungry in sequence labeling parsing. In Proceedings...
- Muñoz-Ortiz, A., D. Vilares, and C. Gómez-Rodríguez. 2023. Assessment of pretrained models across languages and grammars. In J. C. Park, Y....
- Nadeem, M., A. Bethke, and S. Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th...
- Nangia, N., C. Vania, R. Bhalerao, and S. R. Bowman. 2020. CrowS-pairs: A challenge dataset for measuring social biases in masked language...
- Newman, M. L., C. J. Groom, L. D. Handelman, and J. W. Pennebaker. 2008. Gender differences in language use: An analysis of 14,000 text samples....
- Nivre, J. 2010. Dependency parsing. Language and Linguistics Compass, 4(3):138–152.
- Perera, R. and P. Nand. 2016. Answer presentation in question answering over linked data using typed dependency subtree patterns. In Proceedings...
- Qi, P., Y. Zhang, Y. Zhang, J. Bolton, and C. D. Manning. 2020. Stanza: A python natural language processing toolkit for many human languages....
- Reddy, S., O. Täckström, M. Collins, T. Kwiatkowski, D. Das, M. Steedman, and M. Lapata. 2016. Transforming dependency structures to logical...
- Ross, C., B. Katz, and A. Barbu. 2021. Measuring social biases in grounded vision and language embeddings. In Proceedings of the 2021 Conference...
- Rotman, G. and R. Reichart. 2019. Deep contextualized self-training for low resource dependency parsing. Transactions of the Association for...
- Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2019. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint...
- Sap, M., S. Gabriel, L. Qin, D. Jurafsky, N. A. Smith, and Y. Choi. 2020. Social bias frames: Reasoning about social and power implications...
- Schwemmer, C., C. Knight, E. D. Bello-Pardo, S. Oklobdzija, M. Schoonvelde, and J. W. Lockhart. 2020. Diagnosing gender bias in image recognition...
- Sheng, E., K.-W. Chang, P. Natarajan, and N. Peng. 2020. Towards Controllable Biases in Language Generation. In Findings of the Association...
- Sheng, E., K.-W. Chang, P. Natarajan, and N. Peng. 2019. The woman worked as a babysitter: On biases in language generation. In Proceedings...
- Sobhani, N. and S. Delany. 2024. Towards fairer NLP models: Handling gender bias in classification tasks. In A. Falenska, C. Basta, M. Costa-jussà,...
- Stanczak, K. and I. Augenstein. 2021. A survey on gender bias in natural language processing. arXiv preprint arXiv:2112.14168.
- Stanovsky, G., N. A. Smith, and L. Zettlemoyer. 2019. Evaluating gender bias in machine translation. In Proceedings of the 57th Annual Meeting...
- Strzyz, M., D. Vilares, and C. Gómez-Rodríguez. 2019. Viable dependency parsing as sequence labeling. In Proceedings of the 2019 Conference...
- Strzyz, M., D. Vilares, and C. Gómez-Rodríguez. 2020. Bracketing encodings for 2-planar dependency parsing. In Proceedings of the 28th International...
- Sun, T., A. Gaut, S. Tang, Y. Huang, M. ElSherief, J. Zhao, D. Mirza, E. Belding, K.-W. Chang, and W. Y. Wang. 2019. Mitigating gender bias...
- Swayamdipta, S., R. Schwartz, N. Lourie, Y. Wang, H. Hajishirzi, N. A. Smith, and Y. Choi. 2020. Dataset cartography: Mapping and diagnosing...
- Tal, Y., I. Magar, and R. Schwartz. 2022. Fewer errors, but more stereotypes? The effect of model size on gender bias. In C. Hardmeier, C....
- Thakur, V. 2023. Unveiling gender bias in terms of profession across llms: Analyzing and addressing sociological implications. arXiv preprint...
- Tian, Y., G. Chen, and Y. Song. 2021. Enhancing aspect-level sentiment analysis with word dependencies. In Proceedings of the 16th Conference...
- Vashishtha, A., K. Ahuja, and S. Sitaram. 2023. On evaluating and mitigating gender biases in multilingual settings. In A. Rogers, J. Boyd-Graber,...
- Waldis, A., Y. Perlitz, L. Choshen, Y. Hou, and I. Gurevych. 2024. Holmes: Benchmark the linguistic competence of language models. arXiv preprint...
- Wang, T., J. Zhao, M. Yatskar, K.-W. Chang, and V. Ordonez. 2019. Balanced datasets are not enough: Estimating and mitigating gender bias...
- Wang, W., H. Bai, J.-t. Huang, Y. Wan, Y. Yuan, H. Qiu, N. Peng, and M. Lyu. 2024. New job, new gender? Measuring the social bias in image...
- Webster, K., M. Recasens, V. Axelrod, and J. Baldridge. 2018. Mind the GAP: A balanced corpus of gendered ambiguous pronouns. Transactions...
- You, Z., H. Lee, S. Mishra, S. Jeoung, A. Mishra, J. Kim, and J. Diesner. 2024. Beyond binary gender labels: Revealing gender bias in LLMs...
- Zhang, M., Z. Li, G. Fu, and M. Zhang. 2019. Syntax-enhanced neural machine translation with syntax-aware word representations. In Proceedings...
- Zheng, J., F. Fan, and J. Li. 2024. Incorporating lexical and syntactic knowledge for unsupervised cross-lingual transfer. arXiv preprint...