Clinical Federated Learning for Private ICD-10 Classification of Electronic Health Records from Several Spanish Hospitals

Nuria Lebeña Muñoz; Alberto Blanco; Arantza Casillas Rubio; Maite Oronoz Anchordoqui; Alicia Pérez Ramírez

Ayuda

Clinical Federated Learning for Private ICD-10 Classification of Electronic Health Records from Several Spanish Hospitals

Autores: Nuria Lebeña Muñoz, Alberto Blanco, Arantza Casillas Rubio , Maite Oronoz Anchordoqui , Alicia Pérez Ramírez
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 74, 2025, págs. 33-42
Idioma: inglés
Títulos paralelos:
- Aprendizaje clínico Federado para la Clasificación en Base a CIE-10 de Historias Clínicas Electrónicas del Sistema Sanitario Español
Enlaces
- Texto completo
Resumen
- español
  Una limitación en la clasificación de Registros Médicos Electrónicos (RMEs) según la Clasificación Internacional de Enfermedades (CIE) es el reto de conseguir grandes cantidades de documentos clínicos en castellano para entrenar modelos del lenguaje eficientes. El aprendizaje federado (FL) permite el entrenamiento independiente de varios modelos y la posterior unificación de los parámetros de cada modelo resultante para generar un modelo unificado sin necesidad de compartir datos sensibles fuera de las instalaciones clínicas. En este trabajo, analizamos la viabilidad de emplear la estrategia de federación en español en el contexto de una división de datos real: datos generados en el mismo periodo que provienen de dos hospitales reales del sistema de salud vasco. También proponemos un método para pre-entrenar el modelo de lenguaje (LM) de manera federada. Aplicamos este método de pre-entrenamiento federado al entrenamiento de BETO y BERTmultilingüe. Nuestros hallazgos muestran claramente que es factible llevar a cabo el aprendizaje federado para la clasificación de EHR en español utilizando datos distribuidos en diferentes hospitales. Además, la técnica propuesta de pre-entrenamiento federado mejora los resultados del modelo sin pre-entrenamiento adicional.
- English
  A bottleneck in the Electronic Health Records (EHRs) classification according to the International Classification of Diseases (ICD) task is the challenge involved in getting large amounts of clinical Spanish documents for training efficient language models with private health data. The federated learning (FL) strategy enables the independent training of several models and the subsequent unification of each resulting model parameters to generate a unified model without the need to share sensitive data out of the clinical facilities. We analyse the feasibility of employing the federation strategy in Spanish in the context of an actual data division environment: data coming from two real hospitals from the Basque health system and generated in the same period. We also propose a method to further pre-train the language model (LM) in a federated manner. We apply our federated further pre-training method to the training of BETO and BERTmultilingual. Our findings clearly show that it is feasible to carry out federated learning for Spanish EHR classification using data spread across different hospitals. Moreover, the proposed LM further pre-training method steadily surpasses the results of the model without further pre-training.
Referencias bibliográficas
- Almagro, M., R. M. Unanue, V. Fresno, and S. Montalvo. 2020. ICD-10 Coding of Spanish Electronic Discharge Summaries: An Extreme Classification...
- Barros, J., M. Rojas, J. Dunstan, and A. Abeliuk. 2022. Divide and conquer: An extreme multi-label classification approach for coding diseases...
- Berndorfer, S. and A. Henriksson. 2017. Automated diagnosis coding with combined text representations. Stud Health Technol Inform, 235:201–5.
- Blanco, A., A. Pérez, and A. Casillas. 2021. Exploiting ICD Hierarchy for Classification of EHRs in Spanish Through Multi-Task Transformers....
- Canete, J., G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Pérez. 2020. Spanish pre-trained BERT model and evaluation data. PML4DC at...
- Carrino, C. P., J. Llop, M. Pàmies, A. Gutiérrez-Fandiño, J. Armengol-Estapé, J. Silveira-Ocampo, A. Valencia, A. Gonzalez-Agirre, and M....
- Collado-Montañez, J., M.-T. Martín-Valdivia, and E. Martínez-Cámara. 2025. Data augmentation based on large language models for radiological...
- De la Iglesia I., M. Vivó, P. Chocrón, G. de Maeztu, K. Gojenola, and A. Atutxa. 2023. An open source corpus and automatic tool for section...
- Dermouche, M., J. Velcin, R. Flicoteaux, S. Chevret, and N. Taright. 2016. Supervised topic models for diagnosis code assignment to discharge...
- Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding....
- DeYoung, J., H.-C. Shing, L. Kong, C. Winestock, and C. Shivade. 2022. Entity anchored ICD coding. arXiv preprint arXiv:2208.07444.
- Duarte, F., B. Martins, C. S. Pinto, and M. J. Silva. 2018. Deep neural models for ICD-10 coding of death certificates and autopsy reports...
- Grid, E. L. 2022. Bio-bert-spanish. Version 1.0.0 (automatically assigned). [Model]. Source: European Language Grid. https://live.european-language-grid.eu/catalogue/ld/14256.
- Gutiérrez-Fandiño, A., J. Armengol-Estapé, M. Pàmies, J. Llop-Palao, J. Silveira-Ocampo, C. P. Carrino, C. Armentano-Oller, C. Rodriguez-Penagos,...
- Kim, Y., J. Sun, H. Yu, and X. Jiang. 2017. Federated tensor factorization for computational phenotyping. In Knowledge discovery and data...
- Konecny, J., H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon. 2016. Federated learning: Strategies for improving communication...
- Lin, B. Y., C. He, Z. Ze, H. Wang, Y. Hua, C. Dupuy, R. Gupta, M. Soltanolkotabi, X. Ren, and S. Avestimehr. 2022. FedNLP: Benchmarking federated...
- Liu, D., D. Dligach, and T. Miller. 2019. Two-stage federated phenotyping and patient representation learning. In Association for ComputationalLinguistics.,...
- Liu, D. and T. Miller. 2020. Federated pretraining and fine tuning of BERT using clinical notes from multiple silos. CoRR, abs/2002.08562.
- Luboshnikov, E. and I. Makarov. 2021. Federated learning in named entity recognition. Recent Trends in Analysis of Images, Social Networks...
- Mou, C. and J. Ren. 2020. Automated ICD-10 code assignment of nonstandard diagnoses via a two-stage framework. Artificial Intelligence in...
- Organization, W. H. 1993. The ICD-10 classification of mental and behavioural disorders: diagnostic criteria for research, volume 2. World...
- Pires, T., E. Schlinger, and D. Garrette. 2019. How multilingual is multilingual BERT? In Association for Computational Linguistics, pages...
- Rodríguez-Barroso, N., D. Jiménez-López, M. V. Luzón, F. Herrera, and E. Martínez-Cámara. 2023. Survey on federated learning threats: Concepts,...
- Solarte-Pabón, O., O. Montenegro, A. García-Barragán, M. Torrente, M. Provencio, E. Menasalvas, and V. Robles. 2023. Transformers for extracting...
- Teng, F., Y. Liu, T. Li, Y. Zhang, S. Li, and Y. Zhao. 2023. A review on deep neural networks for icd coding. IEEE Transactions on Knowledge...
- Tian, Y., Y. Wan, L. Lyu, D. Yao, H. Jin, and L. Sun. 2022. Fedbert: When federated learning meets pre-training. ACM Transactions on Intelligent...
- van Aken, B., J.-M. Papaioannou, M. Mayrdorfer, K. Budde, F. A. Gers, and A. L¨oser. 2021. Clinical outcome prediction from admission notes...
- Xu, J., X. Xi, J. Chen, V. S. Sheng, J. Ma, and Z. Cui. 2022. A survey of deep learning for electronic health records. Applied Sciences, 12(22):11709.
- Yang, Z., S. Kwon, Z. Yao, and H. Yu. 2023. Multi-label few-shot icd coding as autoregressive generation with prompt. In Conference on Artificial...