Ir al contenido

Documat


Corref-PT: A Semi-Automatic Annotated Portuguese Coreference Corpus

  • Autores: Renata Vieira Maia, Amália Mendes, Paulo Quaresma, Evandro B. Fonseca, Sandra Collovini de Abreu, Sandra Antunes
  • Localización: Computación y Sistemas (CyS), ISSN 1405-5546, ISSN-e 2007-9737, Vol. 22, Nº. 4, 2018, págs. 1259-1267
  • Idioma: inglés
  • DOI: 10.13053/cys-22-4-3063
  • Enlaces
  • Resumen
    • Abstract: This paper describes the Portuguese coreference corpus Corref-PT, annotated semi-automatically using the coreference annotation tool CORP, and manually revised with the editing tool CorrefVisual. It includes a total of 182 texts, mostly news (corpus CSTNews, corpus LE-PAROLE, FAPESP magazine) but also articles from Wikipedia. The result is a corpus that includes a total of 3898 reference chains. We present the coreference annotation tool CORP, which was built on the basis of deterministic rules, and the editor CorrefVisual used for manual revision. We report on the annotation agreement and on the feedback provided by the annotators regarding the editor and the complexity of the task. Examples of technical and linguistic issues encountered during the annotation are given and the pros and cons of such approach for corpus construction are discussed. Our motivation was to use of a semi-automatic approach to increase the set of available resources for coreference resolution applications for Portuguese.

  • Referencias bibliográficas
    • Antonitsch, A.,Figueira, A.,Amaral, D.,Fonseca, E.,Vieira, R.,Collovini, S.,Calzolari, N.,Choukri, K.,Declerck, T.,Goggi, S.,Grobelnik, M.,Maegaard,...
    • Bouma, G.,Daelemans, W.,Hendrickx, I.,Hoste, V.,Mineur, A.. (2007). The corea-project, manual for the annotation of coreference in dutch texts....
    • Chamberlain, J.,Poesio, M.,Kruschwitz, U.,Calzolari, N.,Choukri, K.,Declerck, T.,Goggi, S.,Grobelnik, M.,Maegaard, B.,Mariani, J.,Mazo, H.,Moreno,...
    • Collovini, S.,Carbonel, T. I.,Fuchs, J. T.,Coelho, J. C.,Rino, L.,Vieira, R.. (2007). Summ-it: Um corpus anotado com informações discursivas...
    • Collovini de Abreu, S.,Vieira, R.. (2017). Relp: Portuguese open relation extraction. Knowledge Organization. 44. 163
    • do Amaral, D. O. F.,Vieira, R.. (2014). NERP-CRF: uma ferramenta para o reconhecimento de entidades nomeadas por meio de conditional random...
    • do Nascimento, M. F. B.,Mendes, A.,Pereira, L.. (2004). Providing on-line access to portuguese language resources: Corpora and lexicons. ...
    • Doddington, G.,Mitchell, A.,Przybocki, M.,Ramshaw, L.,Strassel, S.,Weischedel, R.,Lino, M. T.,Xavier, M. F.,Ferreira, F.,Costa, R.,Silva,...
    • Fonseca, E. B.,Sesti, V.,Antonitsch, A.,Vanin, A. A.,Vieira, R.. (2017). Corp - uma abordagem baseada em regras e conhecimento semântico para...
    • Fonseca, E. B.,Vieira, R.,Vanin, A.. (2016). Corp: Coreference resolution for portuguese. 12th International Conference on the Computational...
    • Freitas, C.,Mota, C.,Santos, D.,Oliveira, H. G.,Carvalho, P.. (2010). Second HAREM: advancing the state of the art of named entity recognition...
    • Garcia, M.,Gamallo, P.. (2014). Multilingual corpora with coreferential annotation of person entities. 9th edition of the Language Resources...
    • Hinrichs, E. W.,Kübler, S.,Naumann, K.. (2005). A unified representation for morphological, syntactic, semantic, and referential annotations....
    • Hoste, V.,De Pauw, G.. (2006). Knack-2002: a richly annotated corpus of dutch written text. The Fifth international conference on Language...
    • Howe, J. (2008). Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. 1. Crown Publishing Group. New York, NY, USA.
    • Lee, H.,Chang, A.,Peirsman, Y.,Chambers, N.,Surdeanu, M.,Jurafsky, D.. (2013). Deterministic coreference resolution based on entity-centric,...
    • Lee, H.,Peirsman, Y.,Chang, A.,Chambers, N.,Surdeanu, M.,Jurafsky, D.. (2011). Stanford’s multi-pass sieve coreference resolution system at...
    • Maziero, E. G.,del Rosario Castro Jorge, M. L.,Pardo, T. A. S.. (2010). Identifying multidocument relations. Natural Language Processing and...
    • Mendes, A. (2013). Organização textual e ar-ticulação de orações. Gramática do Português. Fundação Calouste Gulbenkian. Lisboa.
    • Müller, C.,Strube, M.. (2001). Mmax: A tool for the annotation of multi-modal corpora. 2nd IJCAI Workshop on Adaptive Text Extraction and...
    • Pradhan, S.,Ramshaw, L.,Marcus, M.,Palmer, M.,Weischedel, R.,Xue, N.. (2011). Conll-2011 shared task: Modeling unrestricted coreference in...
    • Pradhan, S. S.,Hovy, E.,Marcus, M.,Palmer, M.,Ramshaw, L.,Weischedel, R.. (2007). Ontonotes: A unified relational semantic representation....
    • Recasens, M.,Màrquez, L.,Sapena, E.,Martí, M. A.,Taulé, M.,Hoste, V.,Poesio, M.,Versley, Y.. (2010). Semeval-2010 task 1: Coreference resolution...
    • Recasens, M.,Martí, M. A.. (2010). Ancora-co: Coreferentially annotated corpora for spanish and catalan. Language Resources and Evaluation....
    • Rodríguez, K. J.,Delogu, F.,Versley, Y.,Stemle, E.,Poesio, M.,Calzolari, N.,Choukri, K.,Maegaard, B.,Mariani, J.,Odijk, J.,Piperidis, S.,Rosner,...
    • Santos, D.,Cardoso, N.,Seco, N.,Vilela, R.. (2007). Breve introduçao ao harem. HAREM, a primeira avaliaçao conjunta de sistemas de reconhecimento...
    • Tubino, M. d. O.,Silva, M. M. S.. (2015). Visualização, manipulação e refinamento de correferência em língua portuguesa. Pontifícia Universidade...
    • Uryupina, O.,Artstein, R.,Bristot, A.,Cavicchio, F.,Rodriguez, K.,Poesio, M.. (2016). ARRAU: Linguistically-Motivated Annotation of Anaphoric...
    • van Deemter, K.,Kibble, R.. (1999). What is coreference, and what should coreference annotation be?. Workshop on Coreference and Its Applications,...
Los metadatos del artículo han sido obtenidos de SciELO México

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno