BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications

Andrés Fernández García; Javier de la Rosa; Julio Gonzalo Arroyo; Roser Morante; Enrique Amigó; Alejandro Benito Santos; Jorge Carrillo de Albornoz; Víctor Fresno Fernández; Adrián Ghajari Espinosa; Guillermo Marco Remón; Laura Plaza Morales; Eva Sánchez Salido

Ayuda

BOE-XSUM: Extreme Summarization in Clear Language of Spanish Legal Decrees and Notifications

Autores: Andrés Fernández García, Javier de la Rosa, Julio Gonzalo Arroyo , Roser Morante , Enrique Amigó , Alejandro Benito Santos, Jorge Carrillo de Albornoz , Víctor Fresno Fernández , Adrián Ghajari Espinosa, Guillermo Marco Remón, Laura Plaza Morales , Eva Sánchez Salido
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 75, 2025 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025), págs. 263-282
Idioma: inglés
Títulos paralelos:
- BOE-XSUM: Resúmenes Extremos en Lenguaje Claro de Decretos y Notificaciones Legales en Español
Enlaces
- Texto completo
Resumen
- español
  La capacidad de resumir documentos largos de forma concisa es cada vez más importante en la vida cotidiana debido a la sobrecarga de información, pero existe una notable escasez de este tipo de resúmenes para documentos en español en general, y en el ámbito jurídico en particular. En este trabajo, presentamos BOEXSUM, un conjunto de datos de 3648 resúmenes extremadamente breves en lenguaje claro creados a partir de las entradas del Boletín Oficial del Estado (BOE). El conjunto de datos contiene tanto los resúmenes como los textos originales etiquetados con el tipo de documento. Además, presentamos los resultados de experimentar en modo de fine-tuning y de zero-shot con modelos generativos. Nuestros resultados indican que los modelos generativos supervisados mediante fine tuning funcionan significativamente mejor que los modelos generativos en modo no supervisado, incluso siendo modelos más pequeños. El mejor modelo con finetuning de nuestra experimentación, BERTIN GPT-J 6B (precisión de 32 bits), obtiene resultados un 24% mejores que el mejor modelo no supervisado, DeepSeek-R1 (41,6% vs 33,5%).
- English
  The ability to summarize long documents succinctly is increasingly important in daily life due to information overload, yet there is a notable lack of such summaries for Spanish documents in general, and in the legal domain in particular. In this work, we present BOE-XSUM, a curated dataset comprising 3,648 concise, plain-language summaries of documents sourced from Spain’s "Boletín Oficial del Estado" (BOE), the State Official Gazette. Each entry in the dataset includes a short summary, the original text, and its document type label. We evaluate the performance of medium-sized large language models (LLMs) fine-tuned on BOEXSUM, comparing them to general-purpose generative models in a zero-shot setting. Results show that fine-tuned models significantly outperform their non-specialized counterparts. Notably, the best-performing model—BERTIN GPT-J 6B (32-bit precision)—achieves a 24% performance gain over the top zero-shot model, DeepSeek-R1 (accuracies of 41.6% vs. 33.5%).
Referencias bibliográficas
- Aumiller, D., A. Chouhan, and M. Gertz. 2022. Eur-lex-sum: A multi- and crosslingual dataset for long-form summarization in the legal domain.
- Baeza-Yates, R., B. Ribeiro-neto, D. Mills, O. Bonn, S. Juan, M. Mexico, C. Taipei, A. Wesley, and L. Limited. 1999. Modern information retrieval....
- Baram-Tsabari, A. and B. V. Lewenstein. 2017. Science communication training: What are we trying to teach? International Journal of Science...
- Belwal, R. C., S. Rai, and A. Gupta. 2021. Text summarization using topic-based vector space model and semantic measure. Information Processing...
- Brachman, M., A. El-Ashry, C. Dugan, and W. Geyer. 2025. Current and future use of large language models for knowledge work.
- Budhiraja, R., I. Joshi, J. S. Challa, H. D. Akolekar, and D. Kumar. 2024. “it’s not like jarvis, but it’s pretty close!” – examining chatgpt’s...
- Cachola, I., K. Lo, A. Cohan, and D. Weld. 2020a. TLDR: Extreme summarization of scientific documents. In T. Cohn, Y. He, and Y. Liu, editors,...
- Cachola, I., K. Lo, A. Cohan, and D. S.Weld. 2020b. TLDR: Extreme summarization of scientific documents.
- Cajueiro, D. O., A. G. Nery, I. Tavares, M. K. D. Melo, S. A. dos Reis, L.Weigang, and V. R. R. Celestino. 2023. A comprehensive review of...
- Conde, J., M. Gonzalez, N. Melero, R. Ferrando, G. Martinez, E. Merino-Gomez, J. A. Hernandez, and P. Reviriego. 2024. Open conversational...
- Dalal, M. K. and M. A. Zaveri. 2011. Heuristics based automatic text summarization of unstructured text. In Proceedings of the International...
- De la Rosa, J. and A. Fernández. 2022. Zero-shot reading comprehension and reasoning for spanish with BERTIN GPT-J-6B. In IberLEF@ SEPLN.
- De la Rosa, J., E. G. Ponferrada, M. Romero, P. Villegas, P. González de Prado Salas, and M. Grandury. 2022. BERTIN: Efficient pre-training...
- Edmundson, H. P. 1969. New methods in automatic extracting. J. ACM, 16:264– 285.
- Edmundson, H. P. and R. E. Wyllys. 1961. Automatic abstracting and indexing—survey and recommendations. Commun. ACM, 4(5):226–234, May.
- Ethnologue. 2023. The most spoken languages worldwide. https://www.ethnologue.com/insights/most-spoken-language/. ´Ultimo acceso: 04/04/2024.
- Fekih, S., N. Tamagnone, B. Minixhofer, R. Shrestha, X. Contla, E. Oglethorpe, and N. Rekabsaz. 2022. Humset: Dataset of multilingual information...
- García-Ferrero, I. and B. Altuna. 2024. Noticia: A clickbait article summarization dataset in spanish.
- Gong, Y. and X. Liu. 2001. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th Annual...
- Grusky, M., M. Naaman, and Y. Artzi. 2018. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In M. Walker,...
- Haghighi, A. and L. Vanderwende. 2009. Exploring content models for multidocument summarization. In North American Chapter of the Association...
- Hasan, T., A. Bhattacharjee, M. S. Islam, K. Samin, Y.-F. Li, Y.-B. Kang, M. S. Rahman, and R. Shahriyar. 2021. Xlsum: Large-scale multilingual...
- Hu, E. J., Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. 2022. LoRA: Low-rank adaptation of large language models....
- Hyun, D., X. Wang, C. Park, X. Xie, and H. Yu. 2022. Generating multiple-length summaries via reinforcement learning for unsupervised sentence...
- Khan, A., F. Kamal, M. A. Chowdhury, T. Ahmed, M. T. R. Laskar, and S. Ahmed. 2023. BanglaCHQ-summ: An abstractive summarization dataset for...
- Koto, F., J. H. Lau, and T. Baldwin. 2020. Liputan6: A large-scale Indonesian dataset for text summarization. In K.-F. Wong, K. Knight, and...
- Kagebäck, M., O. Mogren, N. Tahmasebi, and D. Dubhashi. 2014. Extractive summarization using continuous vector space models. 04.
- Ladhak, F., E. Durmus, C. Cardie, and K. McKeown. 2020. WikiLingua: A new benchmark dataset for cross-lingual abstractive summarization.
- Lavie, A. and A. Agarwal. 2007. METEOR: an automatic metric for mt evaluation with high levels of correlation with human judgments. In Proceedings...
- Lin, C.-Y. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain,...
- Mao, X., H. Yang, S. Huang, Y. Liu, and R. Li. 2019. Extractive summarization using supervised and unsupervised learning. Expert Systems with...
- Mao, Y., M. Zhong, and J. Han. 2022. Cite-Sum: Citation text-guided scientific extreme summarization and domain adaptation with limited supervision....
- Mihalcea, R. and P. Tarau. 2004. TextRank: Bringing order into text. In D. Lin and D. Wu, editors, Proceedings of the 2004 Conference on Empirical...
- Mohamed, M. A. 2016. Automatic text summarisation using linguistic knowledge-based semantics.
- Montalt, V. and M. González Davies. 2007. Translating scientific and technical texts: Discourse and communication strategies in the popularization...
- Narayan, S., S. B. Cohen, and M. Lapata. 2018a. Don’t give me the details, just the summary! topic-aware convolutional neural networks for...
- Narayan, S., S. B. Cohen, and M. Lapata. 2018b. Don’t give me the details, just the summary! topic-aware convolutional neural networks for...
- Papineni, K., S. Roukos, T. Ward, and W.- J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In P. Isabelle, E....
- Ryang, S. and T. Abekawa. 2012. Framework of automatic text summarization using reinforcement learning. In J. Tsujii, J. Henderson, and M....
- Scialom, T., P.-A. Dray, S. Lamprier, B. Piwowarski, and J. Staiano. 2020. Mlsum: The multilingual summarization corpus.
- Segarra Soriano, E., V. Ahuir, L.-F. Hurtado, and J. González. 2022. DACSA: A large-scale dataset for automatic summarization of Catalan...
- Sotudeh, S., H. Deilamsalehy, F. Dernoncourt, and N. Goharian. 2021. Tldr9+: A large scale resource for extreme summarization of social...
- Tiersma, P. M. 2003. The plain language movement. Law and Contemporary Problems, 66(1/2):217–240.
- Zhang, T., V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi. 2020. Bertscore: Evaluating text generation with bert.