Specification of a general linguistic annotation framework and its use in a real context

Xabier Artola Zubillaga; Arantza Díaz de Ilarraza Sánchez; Aitor Sologaistoa Fresno; Aitor Soroa Etxabe

Ayuda

Specification of a general linguistic annotation framework and its use in a real context

Autores: Xabier Artola Zubillaga , Arantza Díaz de Ilarraza Sánchez , Aitor Sologaistoa Fresno, Aitor Soroa Etxabe
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 39, 2007, págs. 157-164
Idioma: español
Enlaces
- Texto completo
Resumen
- español
  AWA es una arquitectura general para representar información lingüística producida por procesadores lingüísticos. Nuestro objetivo es definir un esquema de representación coherente y flexible que sea la base del intercambio de información entre herramientas lingüísticas de cualquier tipo. Los análisis lingüísticos se representan por medio de estructuras de rasgos según las directrices de TEI-P4. Estas estructuras y su relación con los demás elementos que componen el análisis forman parte de un modelo de datos diseñado bajo el paradigma de orientación a objetos. AWA se encarga de la representación de la información dentro de una arquitectura más amplia para gestionar todo el proceso de análisis de un corpus. Como ejemplo de la utilidad del modelo presentado explicaremos cómo se ha aplicado dicho modelo en el procesamiento de dos corpus.
- English
  In this paper we present AWA, a general architecture for representing the linguistic information produced by diverse linguistic processors. Our aim is to establish a coherent and flexible representation scheme that will be the basis for the exchange of information. We use TEI-P4 conformant feature structures as a representation schema for linguistic analyses. A consistent underlying data model, which captures the structure and relations contained in the information to be manipulated, has been identified and implemented by a set of classes following the object-oriented paradigm. As an example of the usefulness of the model, we will show the usage of the framework in a real context: two corpora have been annotated by means of an application which aim is to exploit and manipulate the data created by the linguistic processors developed so far.