Tailoring a Knowledge Discovery Framework to Process Pharmacologic Documents

Isabel Moreno; Alejandro Piad Morffis; Yoan Gutiérrez Vázquez; Paloma Moreda Pozo

Ayuda

Tailoring a Knowledge Discovery Framework to Process Pharmacologic Documents

Autores: Isabel Moreno, Alejandro Piad Morffis, Yoan Gutiérrez Vázquez , Paloma Moreda Pozo
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 74, 2025, págs. 43-54
Idioma: inglés
Títulos paralelos:
- Adaptación de un Marco de Descubrimiento de Conocimiento para Procesar Documentos Farmacológicos
Enlaces
- Texto completo
Resumen
- español
  Este trabajo presenta un marco especializado de descubrimiento de conocimiento diseñado para procesar documentos técnicos de salud. Este mejora la tecnología existente, conocida como LETO, mediante la integración de CARMEN, un sistema de clasificación de entidades multilingüe capaz de incorporar semántica relacionada con la salud. Este enfoque colaborativo permite la generación de grafos de conocimiento específicos del dominio en dos idiomas, español e inglés. Además, ofrece un medio valioso para explorar relaciones dentro del ámbito de la salud que, de otro modo, podrían permanecer sin descubrir. La tecnología resultante se somete a un procedimiento de evaluación utilizando métricas estándar empleadas en tareas de descubrimiento de conocimiento, demostrando cómo CARMEN contribuye a aumentar el conocimiento descubierto en LETO. Así, el grafo de conocimiento generado puede aprovecharse para usos de representación explicativa, facilitando una articulación más completa del conocimiento humano y, entre otros fines, sirviendo potencialmente como un recurso educativo.
- English
  This paper introduces a specialized knowledge discovery framework designed to process health technical documents and extract knowledge. The framework improves existing technology, known as LETO, through the integration of CARMEN, a multilingual entity classification system capable of infusing health-related semantics into the initial versatile approach. This collaborative approach enables the generation of domain-specific knowledge graphs for two languages, Spanish and English. Additionally, this provides a valuable means by which to explore relationships within the health domain that could otherwise remain undiscovered. The resulting technology is subjected to an evaluation procedure using standard metrics employed in knowledge discovery tasks, illustrating how CARMEN contributes to an augmentation in the knowledge discovered in LETO. Thus, the generated knowledge graph can be leveraged for the creation of explanatory representation techniques, facilitating a more comprehensive articulation of human knowledge and potentially serving, among other purposes, as an educational resource.
Referencias bibliográficas
- Asim, M. N., M. Wasim, M. U. G. Khan, W. Mahmood, and H. M. Abbasi. 2018. A survey of ontology learning techniques and applications. Database,...
- Bird, S., E. Klein, and E. Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly...
- Boag, W., E. Sergeeva, S. Kulshreshtha, P. Szolovits, A. Rumshisky, and T. Naumann. 2018. Cliner 2.0: Accessible and accurate clinical concept...
- Bodenreider, O. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Research, 32(suppl_1):D267–D270,...
- Breiman, L. 2001. Random Forests. Machine Learning, 45(1):5–32.
- Brown, E. G., L. Wood, and S. Wood. 1999. The medical dictionary for regulatory activities (meddra). Drug Safety, 20(2):109–117, Feb.
- Eberhard, D. M., G. F. Simons, and C. D. Fennig, editors. 2019. Ethnologue: Languages of the world. SIL international. Accessed: 2019-09-04.
- Estevanell-Valladares, E. L., S. Estevez-Velarde, A. Piad-Morffis, Y. Gutierrez, A. Montoyo, R. Muñoz, and Y. Almeida Cruz. 2021. Knowledge...
- Estevez-Velarde, S., Y. Gutierrez, A. Montoyo, A. Piad-Morffis, R. Munoz, and Y. Almeida-Cruz. 2018. Gathering object interactions as semantic...
- Estevez-Velarde, S., A. Montoyo, Y. Almeida-Cruz, Y. Gutiérrez, A. Piad-Morffis, and R. Muñoz. 2019. Demo application for LETO: Learning engine...
- Explosion. 2019. Spacy [online]. Accessed: 2019-05-23.
- Fayyad, U. M., G. Piatetsky-Shapiro, P. Smyth, et al. 1996. Knowledge discovery and data mining: Towards a unifying framework. In KDD, volume...
- Feldman, R. and J. Sanger. 2007. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge University Press,...
- Frey, B. J. and D. Dueck. 2007. Clustering by passing messages between data points. science, 315(5814):972–976.
- Friedman, C., L. Shagina, Y. Lussier, and G. Hripcsak. 2004. Automated Encoding of Clinical Documents Based on Natural Language Processing....
- Galiano, S., R. Muñoz, Y. Gutiérrez, A. Montoyo, J. I. Abreu, and L. A. Ureña. 2023. T2kg: Transforming multimodal document to knowledge graph....
- IHTSDO. 2017. SNOMED Clinical Terms Starter Guide. 2017 edition. (Last accessed: 31/05/2019).
- Konys, A. 2018. Towards knowledge handling in ontology-based information extraction systems. Procedia Computer Science, 126:2208 – 2218. Knowledge-Based...
- Kors, J. A., S. Clematide, S. A. Akhondi, E. M. Van Mulligen, and D. Rebholz-Schuhmann. 2015. A multilingual gold-standard corpus for biomedical...
- Lopes, L. and R. Vieira. 2015. Building and Applying Profiles Through Term Extraction. In X Brazilian Symposium in Information and Human Language...
- McCray, A. T., A. Burgun, and O. Bodenreider. 2001. Aggregating umls semantic types for reducing conceptual complexity. Studies in health...
- Moreno, I., E. Boldrini, P. Moreda, and M. Romá-Ferri. 2017. Drugsemantics: A corpus for named entity recognition in Spanish summaries of...
- Moreno, I. and A. Piad. 2020. knowledgelearning/leto-carmen-ontologies: English and Spanish OWL files, March.
- Moreno, I., M. Romá-Ferri, and P. Moreda. 2019. Carmen: An entity typing system applied to medical corpora that is language-independent and...
- Patrick, J. D., D. H. M. Nguyen, Y.Wang, and M. Li. 2011. A knowledge discovery and reuse pipeline for information extraction in clinical...
- Roberts, A. 2017. Language, structure, and reuse in the electronic health record. AMA journal of ethics, 19(3):281–288.
- Rogers, F. B. 1963. Medical subject headings. Bulletin of the Medical Library Association, 51(1):114–116.
- Savova, G. K., J. J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper-Schuler, and C. G. Chute. 2010. Mayo clinical Text Analysis and...
- Tiedemann, J. 2009. News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In N. Nicolov, K. Bontcheva,...
- Viani, N., C. Larizza, V. Tibollo, C. Napolitano, S. G. Priori, R. Bellazzi, and L. Sacchi. 2018. Information extraction from Italian medical...
- Vítores, D. F. 2017. El español: Una lengua viva. informe 2017 [spanish: a living language. report 2017]. Technical report, Instituto Cervantes.
- Wong, W., W. Liu, and M. Bennamoun. 2012. Ontology learning from text: A look back and into the future. ACM Comput. Surv., 44(4), September.