RoBERTime: A novel model for the detection of temporal expressions in Spanish

Lourdes Araujo Serna; Juan Martínez Romo; Alejandro Sánchez Castro Fernández

Ayuda

RoBERTime: A novel model for the detection of temporal expressions in Spanish

Autores: Lourdes Araujo Serna , Juan Martínez Romo , Alejandro Sánchez Castro Fernández
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 70, 2023, págs. 39-51
Idioma: español
Títulos paralelos:
- RoBERTime: A novel model for the detection of temporal expressions in Spanish
Enlaces
- Texto completo
Resumen
- español
  Las expresiones temporales son todas aquellas palabras que refieran temporalidad. Su detección o extracción es una tarea compleja, ya que depende del dominio del texto, del idioma y de la forma de escritura. Su estudio en español y más específicamente en el dominio clínico es escaso, debido principalmente a la falta de corpora anotados. En este trabajo se propone el uso de grandes modelos del lenguaje para abordar la tarea, comparando el rendimiento de cinco modelos de distintas características. Tras un proceso de experimentación y fine tuning, se logra crear un nuevo modelo llamado RoBERTime para la detección de expresiones temporales en español, especialmente centrado en el dominio clínico. Este modelo se encuentra disponible de forma pública. RoBERTime alcanza resultados del estado del arte en los corpus E3C y Timebank, siendo este el primer modelo público en detección de expresiones temporales en español especializado en el dominio clínico.
- English
  emporal expressions are all those words that refer to temporality. Their detection or extraction is a complex task, since it depends on the domain of the text, the language and the way they are written. Their study in Spanish and more specifically in the clinical domain is scarce, mainly due to the lack of annotated corpora. In this paper we propose the use of large language models to address the task, comparing the performance of five models of different characteristics. After a process of experimentation and fine tuning, a new model called RoBERTime is created for the detection of temporal expressions in Spanish, especially focused in the clinical domain. This model is publicly available. RoBERTime achieves state-of-the-art results in the E3C and Timebank corpora, being the first public model for the detection of temporal expressions in Spanish specialized in the clinical domain.
Referencias bibliográficas
- Almasian, S., D. Aumiller, and M. Gertz 2021. Bert got a date: Introducing transformers to temporal tagging. arXiv preprint arXiv:2109.14927
- Almasian, S., D. Aumiller, and M. Gertz 2022. Time for some german? pre-training a transformer-based temporal tagger for german. In Text2Story@...
- Aumiller, D., S. Almasian, D. Pohl, and M. Gertz. 2022. Online dateing: A web interface for temporal annotations. In Proceedings of the 45th...
- Barros, C., E. Lloret, E. Saquete, and B. Navarro-Colorado. 2019. Natsum: Narrative abstractive summarization through cross-document timeline...
- Bethard, S. 2013. Cleartk-timeml: A minimalist approach to tempeval 2013. In Second joint conference on lexical and computational semantics...
- Bethard, S., G. Savova, M. Palmer, and J. Pustejovsky. 2017. SemEval-2017 task 12: Clinical TempEval. In Proceedings of the 11th International...
- Canete, J., G. Chaperon, R. Fuentes, J.-H Ho, H. Kang, and J. Perez. 2020. Spanish pre-trained bert model and evaluation data Pml4dc at iclr,...
- Carrino, C. P., J. Armengol-Estape, A. Gutierrez-Fandino, J. Llop-Palao, M. P`amies, A. Gonzalez-Agirre, and M. Villegas. 2021. Biomedical...
- Chang, A. X. and C. D. Manning. 2012. Sutime: A library for recognizing and normalizing time expressions. In Lrec, volume 3735, page 3740
- Chen, S., G. Wang, and B. Karlsson. 2019 Exploring word representations on time expression recognition. Technical report, Technical report,...
- Clark, K., M.-T. Luong, Q. V. Le, and C. D. Manning. 2020. Electra: Pretraining text encoders as discriminators rather than generators. arXiv...
- Cortes, C. and V. Vapnik. 1995. Supportvector networks. Machine learning, 20(3):273–297
- Ding, W., G. Gao, L. Shi, and Y. Qu 2019. A pattern-based approach to recognizing time expressions. Proceedings of the AAAI Conference on...
- Eberhard, O. and T. Zesch. 2021. Effects of layer freezing on transferring a speech recognition system to under-resourced languages In Proceedings...
- Gildea, D. and D. Jurafsky. 2002. Automatic labeling of semantic roles. Computational linguistics, 28(3):245–288
- Lafferty, J. D., A. McCallum, and F. C. N Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labelling sequence...
- Lange, L., A. Iurshina, H. Adel, and J. Strotgen. 2020. Adversarial alignment of multilingual models for extracting temporal expressions from...
- Lange, L., J. Strotgen, H. Adel, and D. Klakow 2022. Multilingual normalization of temporal expressions with masked language models. arXiv...
- Lee, J., R. Tang, and J. Lin. 2019. What would elsa do? freezing layers during transformer fine-tuning. arXiv preprint arXiv:1911.03090
- Leeuwenberg, A. and M.-F. Moens. 2018 Temporal information extraction by predicting relative time-lines. arXiv preprint arXiv:1808.09401
- Li, H., J. Strotgen, J. Zell, and M. Gertz 2014. Chinese temporal tagging with heideltime In Proceedings of the 14th Conference of the European...
- Lin, T.-Y., P. Goyal, R. Girshick, K. He, and P. Dollar. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international...
- Llorens, H., E. Saquete, and B. Navarro 2010. Tipsem (english and spanish): Evaluating crfs and semantic roles in tempeval-2. In Proceedings...
- Magnini, B., B. Altuna, A. Lavelli, M. Speranza, and R. Zanoli. 2020. The e3c project: Collection and annotation of a multilingual corpus...
- Mosbach, M., M. Andriushchenko, and D. Klakow. 2020. On the stability of fine-tuning bert: Misconceptions, explanations, and strong baselines
- Nakayama, H. 2018. seqeval: A python framework for sequence labeling evaluation. Software available from https://github.com/chakkiworks/ seqeval
- Navas-Loro, M. and V. Rodríguez-Doncel 2020. Annotador: a temporal tagger for spanish. Journal of Intelligent & Fuzzy Systems, 39(2):1979–1991
- Ng, J. P., Y. Chen, M.-Y. Kan, and Z. Li 2014. Exploiting timelines to enhance multi-document summarization. In Proceedings of the 52nd Annual...
- Nieto, M. G., R. Saurı, and M. A. B. Poveda 2011. Modes timebank: a modern spanish timebank corpus. Procesamiento del lenguaje natural, 47:259–267
- Nivre, J., J. Hall, S. Kubler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret 2007. The conll 2007 shared task on dependency parsing. In...
- Pampari, A., P. Raghavan, J. Liang, and J. Peng. 2018. emrqa: A large corpus for question answering on electronic medical records. arXiv preprint...
- Pennington, J., R. Socher, and C. D. Manning 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on...
- Pustejovsky, J., K. Lee, H. Bunt, and L. Romary 2010. Iso-timeml: An international standard for semantic annotation. In LREC, volume 10, pages...
- Ramshaw, L. A. and M. P. Marcus. 1999 Text chunking using transformation-based learning. In Natural language processing using very large corpora....
- Sang, E. F. and S. Buchholz. 2000. Introduction to the conll-2000 shared task: Chunking arXiv preprint cs/0009008
- Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2019. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv,...
- Saurı, R., J. Littman, B. Knippen, R. Gaizauskas, A. Setzer, and J. Pustejovsky 2006. Timeml annotation guidelines version 1.2. 1
- Skukan, L., G. Glavas, and J. Snajder. 2014 Heideltime. hr: extracting and normalizing temporal expressions in croatian. In Proceedings of...
- Strotgen, J., T. Bogel, J. Zell, A. Armiti, T. V. Canh, and M. Gertz. 2014. Extending HeidelTime for temporal expressions referring to historic...
- Strotgen, J. and M. Gertz. 2010. Heideltime: High quality rule-based extraction and normalization of temporal expressions In Proceedings of...
- Strotgen, J. and M. Gertz. 2013. Multilingual and cross-domain temporal tagging Language Resources and Evaluation, 47(2):269–298
- Sun, Y., G. Cheng, and Y. Qu. 2018 Reading comprehension with graph-based temporal-casual reasoning. In Proceedings of the 27th International...
- Tjong Kim Sang, E. F. 2002. Introduction to the CoNLL-2002 shared task: Language independent named entity recognition. In COLING-02: The 6th...
- UzZaman, N., H. Llorens, L. Derczynski, J. Allen, M. Verhagen, and J. Pustejovsky 2013. Semeval-2013 task 1: Tempeval-3: Evaluating time expressions,...
- Vapnik, V. 1999. The nature of statistical learning theory. Springer science & business media
- Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Advances...
- Zhong, X. and E. Cambria. 2018. Time expression recognition using a constituent based tagging scheme. In Proceedings of the 2018 world wide...
- Zhong, X., A. Sun, and E. Cambria. 2017 Time expression analysis and recognition using syntactic token types and general heuristic rules....