Skip to main content
For the participation of the University of Alicante in the first cross-language Geographic Information Retrieval, we have developed a system made up of three modules. One of them is an Information Retrieval module and the others are Named... more
For the participation of the University of Alicante in the first cross-language Geographic Information Retrieval, we have developed a system made up of three modules. One of them is an Information Retrieval module and the others are Named Entity Recognition modules based on machine learning and based on knowledge. We have carried out several runs with different combinations of these modules for resolving the monolingual and bilingual tasks. The system obtained better result in monolingual task achieving an improvement between 48 % and 69 % above the average. The results are shown and discussed in the paper.
Research Interests:
Research Interests:
Manuel García-Vega, Miguel A. García-Cumbreras L. Alfonso Ureña-López, José M. Perea-Ortega, F. Javier Ariza-López University of Jaén {mgarcia,magc,laurena,jmperea,fjariza }@ujaen.es Oscar Ferrández, Antonio Toral, Zornitsa Kozareva Elisa... more
Manuel García-Vega, Miguel A. García-Cumbreras L. Alfonso Ureña-López, José M. Perea-Ortega, F. Javier Ariza-López University of Jaén {mgarcia,magc,laurena,jmperea,fjariza }@ujaen.es Oscar Ferrández, Antonio Toral, Zornitsa Kozareva Elisa Noguera, Andrés ...
The aim of GeoCLEF 2005 is to retrieve relevant documents by using geographic tags [2]. Nowadays, the fast development of Geographic Information Systems (GIS) involves the need of Geographic Information Retrieval Systems (GIR) that help... more
The aim of GeoCLEF 2005 is to retrieve relevant documents by using geographic tags [2]. Nowadays, the fast development of Geographic Information Systems (GIS) involves the need of Geographic Information Retrieval Systems (GIR) that help GIS systems to obtain ...
Abstract. This paper describes the participation of a combined ap-proach in GeoCLEF-2006. We have participated in Monolingual English Task and we present joint work of the three groups or teams belonging to the project R2D2 4 with a new... more
Abstract. This paper describes the participation of a combined ap-proach in GeoCLEF-2006. We have participated in Monolingual English Task and we present joint work of the three groups or teams belonging to the project R2D2 4 with a new system, ...
As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality... more
As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact.
In this paper, we present an evaluation of the hybrid best-of-breed automated VHA (Veteran's Health Administration) clinical text de-identification system, nicknamed BoB, developed within the VHA Consortium for Healthcare Informatics... more
In this paper, we present an evaluation of the hybrid best-of-breed automated VHA (Veteran's Health Administration) clinical text de-identification system, nicknamed BoB, developed within the VHA Consortium for Healthcare Informatics Research. We also evaluate two available machine learning-based text de-identifications systems: MIST and HIDE. Two different clinical corpora were used for this evaluation: a manually annotated VHA corpus, and the 2006 i2b2 de-identification challenge corpus. These experiments focus on the generalizability and portability of the classification models across different document sources. BoB demonstrated good recall (92.6%), satisfactorily prioritizing patient privacy, and also achieved competitive precision (83.6%) for preserving subsequent document interpretability. MIST and HIDE reached very competitive results, in most cases with high precision (92.6% and 93.6%), although recall was sometimes lower than desired for the most sensitive PHI categories.
Using a publicly available corpus of clinical texts called MTSamples originally created to train medical coders and transcriptionists we applied de-identification annotation guidelines and schema developed by the Consortium for Healthcare... more
Using a publicly available corpus of clinical texts called MTSamples originally created to train medical coders and transcriptionists we applied de-identification annotation guidelines and schema developed by the Consortium for Healthcare Informatics Research (CHIR). We assess the feasibility of a Best of Breed (BOB) de- identification tool to reduce workload associated with building a de-identification annotation layer on this corpus. We also report prevalence estimates, inter-annotator agreement metrics and F1-measures for annotators and the BOB tool.
The aim of GeoCLEF 2005 is to retrieve relevant documents by using geographic tags [2]. Nowadays, the fast development of Geographic Information Systems (GIS) involves the need of Geographic Information Retrieval Systems (GIR) that help... more
The aim of GeoCLEF 2005 is to retrieve relevant documents by using geographic tags [2]. Nowadays, the fast development of Geographic Information Systems (GIS) involves the need of Geographic Information Retrieval Systems (GIR) that help GIS systems to obtain ...
For our participation in the second edition of GeoCLEF, we have researched the incorporation of geographic knowledge into Geographic Information Retrieval (GIR). Our system is made up of an IR module (IR-n) and a Geographic Knowledge... more
For our participation in the second edition of GeoCLEF, we have researched the incorporation of geographic knowledge into Geographic Information Retrieval (GIR). Our system is made up of an IR module (IR-n) and a Geographic Knowledge module (Geonames). The results show that the addition of geographic knowledge has a negative impact on the precision. However, the fact that for some topics the obtained results are better, makes us conclude that the addition of this knowledge could be useful but further research is needed in order to determine how.
De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing... more
De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents. We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.
This paper presents an improvement in the temporal expression (TE) recognition phase of a knowledge based system at a multilingual level. For this purpose, the combination of different approaches applied to the recognition of temporal... more
This paper presents an improvement in the temporal expression (TE) recognition phase of a knowledge based system at a multilingual level. For this purpose, the combination of different approaches applied to the recognition of temporal expressions are studied. In this work, for the recognition task, a knowledge based system that recognizes temporal expressions and had been automatically extended to other
Este artículo presenta la extensión automática del sistema TERSEO a otras lenguas combinada con el uso de técnicas basadas en Aprendizaje Automático (AA). En concreto, en este artículo se trabaja en el reconocimiento de expresiones... more
Este artículo presenta la extensión automática del sistema TERSEO a otras lenguas combinada con el uso de técnicas basadas en Aprendizaje Automático (AA). En concreto, en este artículo se trabaja en el reconocimiento de expresiones temporales para el italiano y se han probado dos técnicas diferentes de AA: un modelo de Máxima Entropía y modelos ocultos de Markov. Cada sistema ha sido evaluado tanto de manera independiente como de manera combinada con la finalidad de analizar si el sistema combinado mejora los resultados de los sistemas independientes sin incrementar el número de expresiones erróneas en el mismo porcentaje. El sistema TERSEO fue combinado previamente con técnicas de AA para el inglés, obteniendo en ese caso buenos resultados. En este artículo, la combinación del reconocimiento de TERSEO con el reconocimiento del sistema de AA ha sido evaluada para el italiano. La combinación de TERSEO con diferentes técnicas de AA ha sido evaluada obteniendo resultados satisfactorios...