Ir al contenido

Documat


An effective and efficient web news extraction technique for an operational newsIR system

  • Autores: Javier Parapar Árbol académico, Álvaro Barreiro García Árbol académico
  • Localización: XII Conferencia de la Asociación Española para la Inteligencia Artificial: (CAEPIA 2007). Actas / coord. por Daniel Borrajo Millán Árbol académico, Luis Castillo Vidal Árbol académico, Juan Manuel Corchado Rodríguez Árbol académico, Vol. 2, 2007, ISBN 978-84-611-8848-2, págs. 319-329
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Web information extraction, in particular web news extraction is an open research problem and it is a key point in NewsIR systems. Current techniques fail in the quality of the results, the high computational costs or the necessity of human intervention, all of them critical issues in a real system. We present an automated approach to news recognition and extraction based on a set of heuristics about the articles structure, that is currently applied in an operational system. We also built a data set to evaluate web news extraction methods. Our results in this collection of international news, composed of 4869 web pages from 15 different on-line sources, achieved a 97% of precision and a 94% of recall for the news recognition and extraction task.


Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno