The complexity of natural language documents that are used in companies and organizations is a challenge for information extraction tasks. Among them, the Named Entity Recognition task (the identification of proper names of people, organizations or locations) is an essential one. However, not all named entities that appear in these types of documents are relevant or have the same purpose and meaning. This purpose and meaning is defined by the role they play in the document and, retrieving all possible named entities is not useful in certain scenarios. Therefore, in this thesis we propose a hierarchical classification of named entities according to their role with the necessary models for their identification. At the same time, the thesis proposes a method for the identification of named entities based on their role using the previously mentioned hierarchy. Both contributions have been instantiated in a real use case in the legal domain using leaked mails in a journalism investigation. Finally, this thesis presents a software library that implements the contributions of the thesis together with other tasks of the information extraction field.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados