Ir al contenido

Documat


Removing Noisy Mentions for Distant Supervision

  • Autores: Ander Intxaurrondo, Mihai Surdeanu Árbol académico, Oier López de Lacalle Lecuona Árbol académico, Eneko Agirre Bengoa Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 51, 2013, págs. 41-48
  • Idioma: inglés
  • Enlaces
  • Resumen
    • español

      Los metodos para Extraccion de Informacion basados en la Supervision a Distancia se basan en usar tuplas correctas para adquirir menciones de esas tuplas, y as entrenar un sistema tradicional de extraccion de informacion supervisado. En este artculo analizamos las fuentes de ruido en las menciones, y exploramos metodos sencillos para ltrar menciones ruidosas. Los resultados demuestran que combinando el ltrado de tuplas por frecuencia, la informacion mutua y la eliminacion de men- ciones lejos de los centroides de sus respectivas etiquetas mejora los resultados de dos modelos de extraccion de informacion signi cativamente.

    • English

      Relation Extraction methods based on Distant Supervision rely on true tuples to retrieve noisy mentions, which are then used to train traditional supervised relation extraction methods. In this paper we analyze the sources of noise in the mentions, and explore simple methods to lter out noisy mentions. The results show that a combination of mention frequency cut-o , Pointwise Mutual Information and removal of mentions which are far from the feature centroids of relation labels is able to signi cantly improve the results of two relation extraction models.

  • Referencias bibliográficas
    • Berg-Kirkpatrick, Taylor, David Burkett, and Dan Klein. 2012. An empirical investigation of statistical significance in nlp. In Proceedings...
    • Craven, Mark and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings...
    • Hoffmann, Raphael, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction...
    • Min, Bonan, Xiang Li, Ralph Grishman, and Sun Ang. 2012. New york university 2012 system for kbp slot lling. In Proceedings of the Fifth Text...
    • Mintz, Mike, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without la- beled data. In Proceedings...
    • Riedel, Sebastian, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the...
    • Sandhaus, Evan. 2008. The new york times annotated corpus. In Linguistic Data Consortium, Philadelphia.
    • Surdeanu, Mihai, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction....

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno