Joseba Fernández de Landa, Rodrigo Agerri Gascón
Names referring to people, institutions, or places may be defined as named entities. Extracting named entities from news texts can help to identify the most commented topics talked about in news media. The main objective of this work is to identify in real-time those named entities that are most commented upon on Basque-language online media. In order to do so, we develop a system to automatically collect and annotate the named entities appearing in news written in Basque language. The annotation of named entities is performed using state-of-the-art deep learning models. Finally, the most frequent identified entities are published weekly in a Wikipedia page to display which entities do not currently have an article in the Basque Wikipedia.
Lan honen helburu nagusia, hedabideetako euskarazko edukian aipatzen diren izendun entitate nabarmenen identifikazioa da, identifikazioa denbora errealean eginez. Horretarako, euskaraz argitaratutako albisteetatik izendun entitateak automatikoki jaso eta etiketatzeko sistema garatu da, artearen egoerako Ikasketa Sakoneko ereduak erabiliz. Izendun entitateen identifikadoreari esker, denbora errealean jasotako albisteetako izendun entitateak etengabe identifikatu eta jasotzen dira, erregistro bat osatuz. Bukatzeko, identifikatutako izendun entitate nabarmenak astero publikatzen dira Wikipediako orri batean, Euskarazko Wikipedian artikulurik ez daukaten entitate nabarmenak erakusteko asmoz.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados