Ir al contenido

Documat


Unsupervised learning of relation detection patterns

  • Autores: Edgard González Pellicer
  • Directores de la Tesis: Jordi Turmo (dir. tes.) Árbol académico
  • Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2012
  • Idioma: inglés
  • Tribunal Calificador de la Tesis: Horacio Rodríguez Hontoria (presid.) Árbol académico, Lluís Márquez Villodre (secret.) Árbol académico, Roman Yangarber (voc.) Árbol académico, Alessandro Moschitti (voc.) Árbol académico, Eneko Agirre Bengoa (voc.) Árbol académico
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Information extraction is the natural language processing area whose goal is to obtain structured data from the relevant information contained in textual fragments.

      Information extraction requires a significant amount of linguistic knowledge. The specificity of such knowledge supposes a drawback on the portability of the systems, as a change of language, domain or style demands a costly human effort.

      Machine learning techniques have been applied for decades so as to overcome this portability bottleneck progressively reducing the amount of involved human supervision. However, as the availability of large document collections increases, completely unsupervised approaches become necessary in order to mine the knowledge contained in them.

      The proposal of this thesis is to incorporate clustering techniques into pattern learning for information extraction, in order to further reduce the elements of supervision involved in the process. In particular, the work focuses on the problem of relation detection. The achievement of this ultimate goal has required, first, considering the different strategies in which this combination could be carried out; second, developing or adapting clustering algorithms suitable to our needs; and third, devising pattern learning procedures which incorporated clustering information.

      By the end of this thesis, we had been able to develop and implement an approach for learning of relation detection patterns which, using clustering techniques and minimal human supervision, is competitive and even outperforms other comparable approaches in the state of the art.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno