, Álvaro Barreiro
, Luis Castillo Vidal
, Juan Manuel Corchado Rodríguez
, Vol. 2, 2007, ISBN 978-84-611-8848-2, págs. 319-329Web information extraction, in particular web news extraction is an open research problem and it is a key point in NewsIR systems. Current techniques fail in the quality of the results, the high computational costs or the necessity of human intervention, all of them critical issues in a real system. We present an automated approach to news recognition and extraction based on a set of heuristics about the articles structure, that is currently applied in an operational system. We also built a data set to evaluate web news extraction methods. Our results in this collection of international news, composed of 4869 web pages from 15 different on-line sources, achieved a 97% of precision and a 94% of recall for the news recognition and extraction task.
© 2008-2026 Fundación Dialnet · Todos los derechos reservados