IBERMAT - Corpus of Human and Machine Translated Multi-Domain Content in Basque, Catalan, Galician and Spanish: Description and Exploitation

Alicia Picazo Izquierdo; Ernesto Luis Estevanell Valladares; Ruslan Mitkov; Rafael Muñoz Guillena; Manuel Palomar Sanz

Ayuda

IBERMAT - Corpus of Human and Machine Translated Multi-Domain Content in Basque, Catalan, Galician and Spanish: Description and Exploitation

Autores: Alicia Picazo Izquierdo, Ernesto Luis Estevanell Valladares, Ruslan Mitkov , Rafael Muñoz Guillena , Manuel Palomar Sanz
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 75, 2025 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025), págs. 337-348
Idioma: inglés
Títulos paralelos:
- IBERMAT - Corpus de Contenido Multidominio Traducido Manual y Automáticamente en Euskera, Catalán, Gallego y Español: Descripción y Explotación
Enlaces
- Texto completo 1 2
Resumen
- español
  El contenido generado por IA dificulta la distinción entre texto producido por humanos y por máquinas para tareas como la verificación de autoría, la moderación de contenido y la evaluación de la calidad. En este artículo se presenta IBERMAT, un nuevo corpus con traducciones humanas y automáticas en tres dominios especializados (clínico, jurídico y literario) y en las cuatro lenguas oficiales de España: euskera, catalán, gallego y castellano. El objetivo principal es detectar si un texto se ha generado por humanos o máquinas. Para ello, se evalúan tres enfoques: (1) métodos tradicionales de aprendizaje automático, (2) modelos de lenguaje basados en transformers con estrategias de fine-tuning completo y adaptación de bajo rango (LoRA), y (3) grandes modelos de lenguaje (LLMs) en escenarios zeroshot. Los resultados muestran que los transformers superan tanto a los modelos tradicionales como a los LLM, aunque no con grandes diferencias. Además, reflejan la calidad de los resultados de los sistemas de TA y las limitaciones de los modelos actuales para detectar matices sutiles. Finalmente, cabe destacar que el contenido traducido automáticamente puede ser más difícil de identificar que el generado por IA en contextos más generales.
- English
  Distinguishing between human- and machine-produced text is crucial for tasks like authorship verification, content moderation, and quality assessment. We introduce IBERMAT, a novel dataset of human and machine translations across three specialised domains (clinical, legal and literary) and four official languages in Spain (Basque, Catalan, Galician and Spanish) and outlines a case study of its exploitation. We evaluate the performance of classifying translation origin using a range of machine learning techniques. We evaluate three approaches: (1) traditional machine learning pipelines, (2) fine-tuned transformer-based language models using full and low-rank adaptation strategies, and (3) LLMs for zero-shot classification. The results show that fine-tuned transformers outperform both traditional ML and zero-shot LLMs, but not with substantial differences. These results highlight both the increasing quality of MT output and the limitations of current models in detecting subtle distinctions, especially when translations may involve post-editing. Our findings also suggest that machine-translated content may be harder to identify than general AI-generated text.
Referencias bibliográficas
- Alghamdi, J., S. Luo, and Y. Lin (2023). “A comprehensive survey on machine learning approaches for fake news detection”. In: doi: 10.1007/s11042-023-17470-8#article-info.
- Alves, D. et al. (2024). Tower: An Open Multilingual Large Language Model for Translation-Related Tasks. arXiv: 2402.17733 [cs.CL]. url: https://arxiv.org/abs/2402.17733.
- Ayele, A. A. et al. (2024). “Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking...
- Bhardwaj, S. et al. (2020). “Human or Neural Translation?” In: Proceedings of the 28th International Conference on Computational Linguistics....
- Bird, S. (2006). “NLTK: the natural language toolkit”. In: Proceedings of the COLING/ACL 2006 interactive presentation sessions, pp. 69–72.
- Duong, A. and P. Gomez-Krämer (2025). “Scalable Framework for Classifying AIGenerated Content Across Modalities”. In: CoRR abs/2502.00375....
- Estevanell-Valladares, E. L. et al. (2024). “Balancing Efficiency and Performance in NLP: A Cross-Comparison of Shallow Machine Learning and...
- Estevez-Velarde, S. et al. (2020). “Automatic discovery of heterogeneous machine learning pipelines: An application to natural language processing”....
- Fagni, T. et al. (2021). “TweepFake: About detecting deepfake tweets”. In: PLOS ONE 16.5. Ed. by Qingzhong Liu, e0251415. issn: 1932-6203....
- Forcada, M. (2017). “Making sense of neural machine translation”. In: Translation Spaces 6, pp. 291–309. doi: 10.1075/ts.6.2.06for.
- Forment, M. et al. (2025). “LAMB: An opensource software framework to create artificial intelligence assistants deployed and integrated into...
- Frankenberg-Garcia, A. (2021). “Can a corpus-driven lexical analysis of human and machine translation unveil discourse features that set them...
- Fröhling, F. and A. Zubiaga (2021). “Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover”. In: PeerJ Computer...
- Fu, Y. and M. Nederhof (2021). “Automatic Classification of Human Translation and Machine Translation: A Study from the Perspective of Lexical...
- Gritsai, G. et al. (2025). Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts. arXiv: 2410.14677 [cs.CL]....
- Guerreiro, N. M. et al. (2023). Hallucinations in Large Multilingual Translation Models. arXiv: 2303.16104 [cs.CL]. url: https://arxiv.org/abs/2303.16104.
- Hans, A. et al. (2024). Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. arXiv: 2401.12070 [cs.CL]. url: https://arxiv.org/abs/2401.12070.
- Hashmi, E. et al. (2024). “Advancing Fake News Detection: Hybrid Deep Learning With FastText and Explainable AI”. In: IEEE Access 12, pp....
- He, P., J. Gao, and W. Chen (2021). DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing....
- He, P., X. Liu, et al. (2021). DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv: 2006.03654 [cs.CL]. url: https://arxiv.org/abs/2006.03654.
- He, X. et al. (2024). MGTBench: Benchmarking Machine-Generated Text Detection. arXiv: 2303.14822 [cs.CR]. url: https://arxiv.org/abs/2303.14822.
- Jain, S. M. (2022). “Hugging face”. In: Introduction to transformers for NLP: With the hugging face library and models to solve problems....
- Jawahar, G., M. Abdul-Mageed, and L. V.S. Lakshmanan (2020). “Automatic Detection of Machine Generated Text: A Critical Survey”. In: Proceedings...
- Jiao, W. et al. (2023). Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv: 2301.08745 [cs.CL]. url: https://arxiv.org/abs/2301.08745.
- Kashnitsky, Y. et al. (2022). “Overview of the DAGPap22 Shared Task on Detecting Automatically Generated Scientific Papers”. In: Proceedings...
- Koike, R., M. Kaneko, and N. Okazaki (2024). OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated...
- Kramer, O. (2016). “Scikit-learn”. In: Machine learning for evolution strategies, pp. 45–53.
- Kumar, P., S. Manikandan, and R. Kishore (2024). “AI-Driven Text Generation: A Novel GPT-Based Approach for Automated Content Creation”. In:...
- Li, J. et al. (2022). Pretrained Language Models for Text Generation: A Survey. arXiv: 2201.05273 [cs.CL]. url: https://arxiv.org/abs/2201.05273.
- Li, Y. et al. (2024). MAGE: Machine-generated Text Detection in the Wild. arXiv: 2305.13242 [cs.CL]. url: https://arxiv.org/abs/2305.13242.
- Liang, X. et al. (2024). Controllable Text Generation for Large Language Models: A Survey. arXiv: 2408.12599 [cs.CL]. url: https://arxiv.org/abs/2408.12599.
- Mahmud, T. et al. (2024). “Enhanced Fake News Detection through the Fusion of Deep Learning and Repeat Vector Representations”. In: 2024 2nd...
- Mei, P. et al. (2025). “If ChatGPT can do it, where is my creativity? generative AI boosts performance but diminishes experience in creative...
- Papageorgiou, E. et al. (2024). “A Survey on the Use of Large Language Models (LLMs) in Fake News”. In: Future Internet 16.8. issn: 1999-5903....
- Radford, A. et al. (2019). “Language Models are Unsupervised Multitask Learners”. In: url: https://api.semanticscholar.org/CorpusID:160025533.
- Sarvazyan, A. M., José Ángel González, Marc Franco-Salvador, et al. (2023). Overview of AuTexTification at IberLEF 2023: Detection and Attribution...
- Sarvazyan, A. M., José Ángel González, Francisco Rangel, et al. (2024). “Overview of IberAuTexTification at IberLEF 2024: Detection and Attribution...
- Shamardina, T. et al. (2022). “Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian”. In: Computational Linguistics...
- Simard, M. (2024). “Position Paper: Should Machine Translation be Labelled as AIGenerated Content?” In: Proceedings of the 16th Conference...
- Son, J. and B. Kim (2023). “Translation Performance from the User’s Perspective of Large Language Models and Neural Machine Translation Systems”....
- Su, Z. et al. (2024). HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus. arXiv: 2309.02731 [cs.CL]. url: https://arxiv.org/abs/2309.02731.
- Sun, L. et al. (2024). “MetaWriter: Exploring the Potential and Perils of AI Writing Support in Scientific Peer Review”. In: 8.CSCW1. doi:...
- Vanmassenhove, E., D. Shterionov, and A. Way (2019). “Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation”....
- Vaswani, A. et al. (2023). Attention Is All You Need. arXiv: 1706.03762 [cs.CL]. url: https://arxiv.org/abs/1706.03762.
- Verma, V. et al. (2024). Ghostbuster: Detecting Text Ghostwritten by Large Language Models. arXiv: 2305.15047 [cs.CL]. url: https://arxiv.org/abs/2305.15047.
- Wang, Y., J. Mansurov, et al. (2024). “SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection”. In:...
- Wang, Y., A. Shelmanov, et al. (2025). GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human....
- Werff, T. van der, R. van Noord, and A. Toral (2022). “Automatic Discrimination of Human and Neural Machine Translation: A Study with Multiple...
- Wittenberg, C., Z. Epstein, and A. Berinsky (2024). “Labeling AI-Generated Content: Promises, Perils, and Future Directions”. In: An MIT Exploration...